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PREFACE 


Fibrous proteins represent a substantial subset of the human proteome. 
They include the filamentous structures found in animal hair that act as a 
protective and thermoregulatory outer material. They are responsible for 
specifying much of an animal’s skeleton, and connective tissues such as 
tendon, skin, bone, cornea and cartilage all play an important role in this 
regard. Fibrous proteins are frequently crucial in locomotion and are 
epitomised by the muscle proteins myosin and tropomyosin and by elastic 
structures like titin. Yet again the fibrous proteins include filamentous 
assemblies, such as actin filaments and microtubules, where these provide 
supporting structures and tracks for the action of a variety of molecular 
motors. 

It is nearly 20 years since this field was fully reviewed and there have 
been very significant advances in that time. The present book, therefore, 
represents one of a set of three volumes in the Elsevier ‘Advances in Protein 
Chemistry series covering the entire fibrous protein field. These are 
entitled: 


Fibrous Proteins: Cotled-Coils, Collagen and Elastomers 
Fibrous Proteins: Muscle and Molecular Motors 
Fibrous Proteins: Amyloids, Prions and B-Proteins 


The present volume covers ‘Coiled-Coils, Collagen and Elastomers’. The 
first few Chapters describe the importance of a-helical coiled-coil proteins 
with examples of the intermediate filament proteins, the spectrin 
superfamily and fibrin/fibrinogen. It is shown that the design principles 
governing the structures of coiled-coil proteins are largely discernible and 
can be specified with a high degree of confidence, thanks in large part to 
the wealth of crystal structures now available. Within the connective tissues 
covered here, constituents of defining importance mechanically are 
collagen fibrils and networks, and elastic fibres. Details are given of crystal 
structures of collagen peptides and the effects on conformation of the 
precise sequence of the distinct constituent triplets. The ultrastructures of 
connective tissues are largely defined by the spatial arrangement of the 
collagen fibrils and networks, and these too are elucidated in some detail. 
The final part of the book covers elastic fibres with their elastin cores and 
fibrillin-containing microfibril palisades. A common theme throughout is 
the increased characterisation of the structures and functions of mutants. 
Some of these occur naturally and lead to disease, whilst others have been 
genetically engineered in order to study design principles. 


xi 


xii PREFACE 


The complete set of three books will provide a compendium of up-to- 
the-minute information on the entire fibrous protein field. Each Chapter, 
which is clearly written, fully illustrated and with a comprehensive citation 
list, is by an acknowledged authority in the field. It is our hope that, 
together, these books will enable valuable comparisons to be made, they 
will allow general principles to be elucidated and they will help to take the 
fibrous protein field forward in good shape into the 21st Century era of 
post-genomics, molecular medicine and nanoscience. 


David Parry 
Massey University 
New Zealand 


John Squire 
Imperial College London 
United Kingdom 
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ABSTRACT 


Coiled-coil proteins, collagen, and elastomers together comprise an 
important subset of the fibrous proteins. The former group—the a-fibrous 
coiled-coil proteins—are widely distributed in nature and, indeed, the 
characteristic heptad motif has been recognized as an oligomerisation 
motif in fibril-forming collagens. This volume has selected a number of 
the a-fibrous proteins for detailed discussion, including intermediate fila- 
ment proteins, the spectrin superfamily, and fibrin/fibrinogen. Of partic- 
ular interest is the growing realization that the design principles governing 
the structures of these coiled-coil proteins are now largely discernible and 
can be specified with a high degree of confidence, due in large part to the 
wealth of crystal structure data now available. Within the connective tissues 
covered in this volume, two constituents of defining importance me- 
chanically are the collagen fibrils/networks and the elastic fibers. Crystal 
structures of collagen peptides have been published and are described. 
The effects of the precise sequence of the distinct constituent triplets on 
molecular conformation have also become clearer. The ultrastructures 
of connective tissues are largely defined by the spatial arrangement of 
the collagen fibrils and networks, and this is elucidated here in some 
detail. The elastic fibers with their elastin cores and fibrillin-containing 
microfibril palisades are also described. A theme underlying all of the 
proteins discussed in this volume is the significantly increased effort 
to characterize the structures and functions of mutants. Some of these 
occur naturally and lead to various disease states, while others have been 
genetically engineered in order to study design principles. 
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I. INTRODUCTION 


Over the past decade, rapid progress has been made in gaining an 
understanding of the structure and function of fibrous proteins and it is 
now timely to produce a comprehensive treatise on the subject. Towards 
that end, three volumes of Advances in Protein Chemistry have been 
commissioned by Elsevier on the general theme of ‘‘Fibrous Proteins 
and Related Structures.” The first volume focused on ‘Molecular Motors 
and Muscle’ and was edited by Squire and Parry. This volume, the second in 
the series, examines ‘‘Coiled Coils, Collagen, and Elastomers” (edited by 
Parry and Squire). The third volume, edited by Kajava, Squire, and Parry, 
discusses ‘“‘Amyloids, Prions, and ß-Proteins.’’ Together, these volumes 
cover most of the key developments that have occurred in recent years 
and a wealth of information is provided for the reader by experts in the 
field. It is hoped that these three volumes will provide a solid and up-to- 
date basis for further studies and will help to identify areas requiring new 
insights. 


II. a-FIBROUS PROTEINS 


Fibrous protein sequences are often characterized by the presence of 
simple repetitive motifs. Some are exact in length and/or sequence, but 
others are only approximate and display considerable variation. Some 
motifs contain residues that are absolutely conserved in some positions, 
whereas in others it is only the sequence character that is maintained over 
the repeat length. In many fibrous proteins the repeats occur contigu- 
ously, whereas in others they are found widely separated in the sequence. 
The varieties of sequence repeat that have been observed are typed and 
catalogued here by Parry (Chapter 2). Each motif forms a discrete element 
of structure; in many instances, these are arranged helically with respect to 
one another. In many cases an elongate structure is formed, and this can 
lead naturally to molecular aggregation and the formation of functional 
filaments. 

The structures of many of these motifs have already been characterized 
by X-ray crystallography and nuclear magnetic resonance (NMR) meth- 
ods. A particular motif of note is the underlying heptad (and related 
nonheptad) substructure present in a-fibrous proteins. The design prin- 
ciples of this class of structure are becoming increasingly well understood 
in terms of amino acid sequence. Furthermore, the importance of the 
interactions that arise between and within chains to stabilize and specify 
the secondary and tertiary structure has become better appreciated. Much 
of this progress can be attributed to the detailed crystallographic studies 
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that have been undertaken using fragments of native proteins and specifi- 
cally designed variants. Lupas (Chapter 3) summarizes the latest structural 
information on a-fibrous proteins and, in so doing, reveals the stunning 
progress that has been made in recent years. The wide variation in confor- 
mation exhibited in the crystal structures has challenged what we really 
mean by the term coiled coil, and it is clear that the spectrum of structures 
extends well beyond that which was initially considered "standard" As 
Lupas notes, we now have fibers, zippers, tubes, sheets, spirals, funnels, and 
rings that are all specified by coiled-coil packing arrangements. Computer 
analyses of the amino acid sequences of these proteins have begun to 
provide rules for distinguishing one multimer from another, but a chal- 
lenge still exists in recognizing imperfect coiled coil repeats in a sequence. 
Indeed, breaks in these underlying substructures (stutters, stammers, skips) 
have made recognition a much more difficult exercise than would have 
been apparent after the first sequence of an a-fibrous protein (tropomyo- 
sin) was derived some 30 years ago, This, of course, displayed a perfect 
heptad repeat, but tropomyosin is now seen as an exception to the rule. The 
physical significance of stutters and skips De, heptad and nonheptad 
phase discontinuities) is reasonably well understood in terms of local un- 
winding of the constituent a-helical strands. However, no crystal structure 
of a stammer region has yet been determined. 

Woolfson’s review (Chapter 4) neatly complements that of Lupas. He 
describes the real progress that has been made on the design principles 
underlying the coiled-coil conformation. In particular, he addresses the 
rationale behind the presence of certain residues or residue types in 
specific positions in the underlying heptad substructure characteristic of 
many a-fibrous proteins. These features drive the assembly of the constit- 
uent right-handed a-helices towards a left-handed multistranded struc- 
ture, known here as a canonical coiled coil. However, the heptad repeat 
is not the only one of importance that can lead to a coiled-coil structure. 
Indeed, nonheptad repeats may result in noncanonical coiled coils 
that can lack either left-handed or regular geometry. For example, right- 
handed coiled coils can be formed from hendecad (11 residue) sequence 
repeats. The distribution of charged and apolar residues gives rise to 
amphipathic a-helices for both heptad and nonheptad repeats, but subtle- 
ties in the presence of branched or unbranched apolar residues in posi- 
tions a and d (for instance) and appropriately charged residues in 
positions eand g (or sometimes in positions band c, also) determine the 
appropriate number of strands, the formation of homo- or hetero- 
multimers, their relative axial alignment, and their relative polarity. The 
coiled coil has proved to be an ideal system to explore the role(s) played 
by individual amino acids in determining both structure and function. 
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Woolfson’s review provides a summary of the state of the art in coiled-coil 
design, and provides a keen insight into both the future potential of 
design methods and the challenges that they face. 

Intermediate filament proteins, and others intimately associated with 
them, have become of major interest to cell biologists, especially in the 
past two decades. However, the history of research on wool and the hard 
a-keratins goes back much further, to the mid 1930s. This field was small, 
very specialized, and not well understood by the wider scientific commu- 
nity. This situation changed dramatically soon after the discovery of a third 
set of filaments in cells that were intermediate in size between the large 
microtubules and the small microfilaments. It was recognized that the 
wool and hair keratins were closely related by both sequence and structure 
to the intermediate filaments. This was a defining moment in the field. 
Almost overnight, the detailed structural work available on the hard 
a-keratins was related to the chemical data that were so easily gained 
(in comparison to the hard a-keratins) on the intermediate filaments. 
This understanding revolutionized the field and led to many subsequent 
significant advances. 

Parry (Chapter 5) describes the structure of the intermediate filament 
proteins seen in hair, skin, vimentin, and neurofilaments (among others). 
Here he details the data that allow the role of individual residues and 
short lengths of sequences in these proteins to be understood in terms of 
secondary and tertiary structure, aggregation characteristics, and function. 
Site-directed mutagenesis studies have been particularly important in 
this regard, as has a comparison of the large number of intermediate 
filament sequences now available. These studies have enabled potentially 
conserved inter- and intra-ionic interactions to be recognized, as well as 
specifying those residues involved in stabilizing the four particular modes 
of molecular aggregation characterized from crosslinking data. 

Intermediate filament associated proteins (IFAPs) are also key com- 
ponents of the system. Green, Jones, and colleagues (Chapter 6) provide 
a cogent account of the structure-function relationships of a number of 
these, including filaggrin, trichohyalin, and increasingly numerous mem- 
bers of the plakin family. A host of other IFAPs have also been character- 
ized in recent years, and these too are discussed in detail. It has long been 
known, of course, that IFAPs have a critical role in linking together 
elements of the intermediate filament scaffold in cells. What has been 
less well understood, but which is now made very evident from the review 
of Green et al., is that the IFAPs have a large range of functions far beyond 
those relating purely to structural connections. Indeed, Green et al. show 
that IFAPs can be molecular motors, chaperones, enzymes, adaptors, 
membrane receptors, or proteins that relate to cell homeostasis and 
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metabolism. IFAPs, originally the poor relation to the IF proteins, have 
now very clearly come into their own. It is only by understanding the 
interactions of the IF-IFAP complex that a full understanding can be 
gained of this complex group of biological tissues that include, among 
many others, skin and hair. 

Triple-helical a-helical coiled coils are achieved in vivo in two very 
distinct ways. In the first, a single chain folds back on itself to form a 
structure with two similarly and one oppositely directed ‘‘strand.’’ These 
coiled coils are exemplified by the (related) structures seen in the rod 
domain of members of the spectrin superfamily of a-spectrin, a-actinin, 
and dystrophin/utrophin, each of which is derived from a common 
a-actinin ancestor (Broderick and Winder, Chapter 7). The dumbbell- 
shaped molecules have a flexible rodlike structure with membrane- 
anchoring capabilities. This superfamily is specifically characterized by 
the presence of contiguous spectrin repeats, each of which forms the 
three-a-helix motif noted above. Actin-binding domains (N-terminal do- 
main) are found in all members except a-spectrin; EF hands (C-terminal 
domain) are found in all members except {-spectrin. Some members of 
the superfamily also contain additional domains specific to their particular 
function, thus extending the group’s role considerably beyond that of 
simply crosslinking actin filaments together in order to form bundles. As 
with many other fibrous proteins, considerable efforts have been made 
in recent years to understand the effects produced by mutations observed 
in various disease states, not least of which (in dystrophin) relates 
to muscular dystrophy. The three-dimensional conformations of actin- 
binding domains, for example, exhibited in most members of the spectrin 
superfamily have now been determined at atomic resolution by either 
X-ray crystallography or NMR studies. These have been very informa- 
tive in identifying residues that have special importance structurally and 
functionally. 

The second (and more common) situation of a triple-helical a-helical 
coiled coil occurs when three chains aggregate, usually parallel to one 
another and with the chains in axial register. An example in this case is 
provided by the blood-clotting protein fibrin/fibrinogen. Marked progress 
on the structure and function of fibrin/fibrinogen has been made in 
recent times and, as a result, Weisel’s review of the field (Chapter 8) will 
be seen as particularly timely to the research community. Structurally, 
X-ray diffraction studies on a variety of fibrinogen fragments have revealed 
details of the molecular conformation, which consists essentially of a 
central region linked via triple-stranded coiled-coil domains to globular 
domains lying at each end of the 45-nm-long molecule. Consequentially, 
chain connectivity is now well understood, as are the disulphide bond 
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connections and the domain organization. This picture has assisted in 
advancing our functional knowledge of fibrinogen, especially with regard 
to its metabolism and biosynthesis. Weisel reports that research is now very 
close to revealing the molecular mechanisms of fibrin polymerization. 
Fibrinogen/fibrin represents a rapidly advancing field, which has proved 
both exciting for those involved and rewarding in terms of our increased 
understanding of one of nature’s most important molecules. 


II. CONNECTIVE TISSUE 


Connective tissues are a family of closely related but structurally/func- 
tionally diverse biological materials. Their importance in defining the 
skeleton, our bodies, and our mode of locomotion is beyond dispute. 
Likewise, the mechanical attributes of connective tissues in being able to 
withstand external and internal stresses, and in providing integrity to 
almost every tissue in the body, are unequivocally of great importance 
in sustaining life as we know it. This volume is concerned with disseminat- 
ing the latest knowledge on crucial components of connective tissues— 
collagen (Chapters 9-11) and elastic fibers (Chapters 12 and 13). 

It rarely fails to astound those working in the field of connective tissue 
that nature has been able to produce such a diverse range of mechanical 
and functional materials using essentially the same components— 
collagens, proteoglycans, elastic fibers, minerals, water, cells, and a variety 
of rather more minor proteins. Admittedly, these components are used in 
very different proportions in various tissues, and genetically distinct types 
of collagen as well as diverse proteoglycans (for example) are incorporated 
where appropriate. The three-dimensional organization of these elements 
is also highly specialized and this has evolved to allow the function of each 
tissue to be optimized. In some cases, the collagen molecules aggregate to 
form fibrils, sometimes with a very narrow distribution of sizes (as in the 
cornea), but other times with a wide range of diameters (as in tendons). In 
other cases, the collagen molecules form distinct and well characterized 
networks, basement membrane being one of particular note. For largely 
functional reasons, collagens are increasingly found in heteropolymeric 
rather than homopolymeric structures. These heteropolymeric fibrils 
and/or networks interact in a highly precise manner with specific proteo- 
glycans to specify superaggregates that determine many of the mechanical 
attributes displayed by the intact tissue. Collagen fibrils assemble to form 
even larger ensembles, and they do so in a manner directly related to their 
function. For example, the fibrils can form unidirectional fibers (as in 
tendons), two-dimensional layers (as in skin), or more complex three- 
dimensional arrangements (as in cartilage and the cornea). Elastic fibers 
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are present in virtually all connective tissues, though the amount present 
varies greatly. These fibers are present most often (and in the highest 
quantities) in tissues that are frequently flexed, as in the necks (ligamen- 
tum nuchae) of grazing animals. The variation and general decrease in 
water content with age also impinges directly on the mechanical attributes 
of the tissue as a whole. 

The classical concept that collagen molecules all have an extensive 
region in their amino acid sequences with an uninterrupted triplet sub- 
structure of the form (G-X-Y), has not been tenable for many years. 
Indeed, the collagen types that have now been characterized (27 at the 
latest count) vary significantly from one another in this regard. For 
example, fibril-forming collagens tend to show a very long and continuous 
triplet substructure (338-341 triplets). In contrast, nematocyst minicolla- 
gens contain only 14 triplets. Types IV and VII collagen are different 
again, as they contain multiple (more than 20) breaks in the phasing of 
their triplet substructures. These have the effect of enhancing molecular 
flexibility. In the case of Clq, a single discontinuity results in a rigid kink 
being formed (Brodsky and Persikov, Chapter 9). Thus, there is consider- 
able variation on the triplet theme, not only in the collagens but also in 
many other proteins containing triplet subdomains. 

Crystal structure data on collagen peptides have revealed exciting infor- 
mation on the triple-helical structure and the manner by which it is 
stabilized through hydrogen bonding networks and water interactions. 
Interestingly, the helical parameters have been shown to be sequence- 
dependent with gly-pro-hyp sequences having 7/2 symmetry and more 
general sequences being more akin to 10/3. Allied to the latest data on 
collagen mutations, Brodsky and Persikov provide a comprehensive con- 
tribution on collagen structure and function that brings the field right up 
to date. 

Wess (Chapter 10) considers the structure of fibril-forming collagens. 
These generally constitute the bulk of the collagen present in a tissue; they 
have particular importance through their size and ultrastructural organi- 
zation in providing considerable mechanical strength to the tissue. The 
fibrils have an axial period of about 65-67 nm and diameters ranging up 
to about 500 nm. In all cases, the lateral dimensions are highly regulated, 
especially so for the uniform fibril diameters seen in the cornea, a factor 
related directly to transparency. Frequently, more than one collagen type 
is found in these fibrils. Sometimes they appear predominantly in the core 
of the fibril, but in other cases they are found on the fibril surface, where 
they can interact most easily with other molecules to provide the necessary 
functionality. FACIT collagen (Fibril-Associated Collagens with Inter- 
rupted Triple helices) is the terminology used to describe the latter type 
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of molecules that bind specifically at a fibril surface. Examples include 
collagen types IX, XII, and XIV. The packing of collagen molecules has 
been determined using X-ray diffraction techniques. From these data it is 
possible to assess the manner in which these molecules group together to 
form subfibrillar elements, such as microfibrils. This is also described by 
Wess in Chapter 10. 

Other collagens, such as types IV, VI, VOII, and X, do not form fibrils. 
Instead, they aggregate to form meshes and open networks, thereby 
providing a selective molecular filter/barrier or a supporting structure 
for the tissue. The N- and C-terminal regions in these molecules lack 
a triplet substructure, but often play an important role in facilitating 
network formation through interactions between constituent molecules. 
Because they are often intrinsically disordered, elucidating details of the 
structure of this type of network is difficult (sometimes extremely so), but 
Knupp and Squire (Chapter 11) provide a comprehensive analysis of the 
pertinent data relating to the most common network-forming collagens. 
They also describe a possible collagen ‘“‘segmented supercoil’’ motif that, 
in addition to the N- and C-terminal interactions, may be an important 
mode of intermolecular interaction. Other features of similarity and 
difference in network-forming collagens are highlighted. 

In order to understand more of the complex interrelationships defining 
the structure and function of connective tissues, it is necessary to consider 
not only collagen but also other components. In this volume, however, 
discussion has been confined to the role played by the elastic fibers. 
These are largely (but not totally) composed of two proteins: fibrillin 
molecules that assemble to form a circumferentially arranged picket fence 
of microfibrils, and a highly crosslinked mass of tropoelastin molecules 
that constitute the central core. Specifically, it is fibrillin-1 that is the major 
constituent of the microfibrillar component of elastic fibers. This glyco- 
protein is about 160 nm in length and consists of 47 EGF-like domains, 
seven TB domains, two hybrid motifs, and a putative hinge region rich in 
proline residues. At present, there is much debate on the precise arrange- 
ment of the fibrillin-1 molecules in the 10- to 12-nm diameter microfibrils 
and their mechanical attributes; Kielty et al. (Chapter 12) discuss the 
“hinged” and “‘staggered’”’ models in some detail. Microfibril-associated 
proteins, of which an increasing number have now been identified, are 
also significant both in the assembly process and in defining the function 
of the microfibril in vivo. While both aspects are currently imperfectly 
understood, it is clear that progress is being made. The microfibrils, 
assembled under a partially cell-regulated process, provide an outer frame- 
work within which the deposition of tropoelastin molecules is directed, but 
it is relevant to note here that microfibrils are also present in substantial 
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numbers in tissues that lack elastin. Mutations in fibrillin-1 have also been 
characterized and shown to cause Marfan syndrome, a heritable disease 
resulting in severe aortic, ocular, and skeletal defects. 

Tropoelastin, on the other hand, is a soluble, medium-size protein 
(about 60-70 kDa) that becomes intimately attached to other tropoelastin 
molecules through lysine-mediated crosslinks. Elastin, as it is then termed, 
constitutes the bulk of the elastic fibers (about 90%) and is a resilient, 
elastic, but insoluble assembly with an extraordinarily long half-life (70 
years). As noted earlier, elastin interacts with and is surrounded by the 
fibrillin-containing microfibrils and their associated proteins, thereby gen- 
erating an entity in vivo that is perfectly designed for its biomechanical 
role in vivo. Debate in the elastin field centers on both the structural form 
adopted by elastin and the mechanism by which elasticity is achieved. 
Mithieux and Weiss (Chapter 13) discuss and compare the four major 
conformational models that have been proposed—random chain, liquid 
drop, oiled coil, and fibrillar. Each is structurally distinct, but all of them 
invoke entropic changes as the driving force (albeit in slightly different 
ways) to explain the mechanism by which elasticity is conferred to the 
elastic fibers. 

The discussion here focuses on the structure and function of individual 
components of connective tissue (collagens, elastin, and fibrillin) rather 
than trying to integrate these components into the functioning whole. 
This latter goal, of course, represents the Holy Grail for those working in 
the field, but it is only by understanding the components in greater detail 
that we are likely to comprehend and appreciate the working of the entire 
system of interacting elements. 


IV. SUMMARY 


Fibrous proteins provide a number of challenges to those seeking to 
understand them in detail at the molecular level. Not least of these 
challenges is trying to crystallize fibrous proteins in a form suitable for 
structural investigation using X-ray diffraction techniques. Even crystalli- 
zation of fragments has proved problematic, though real progress has 
now been made and some informative results have been obtained for 
intermediate filament proteins, muscle proteins, fibrinogen, collagen, 
and many other fibrous molecules. In some cases, NMR methods have 
allowed the crystallization step to be avoided, thus permitting proteins to 
be studied in solution. Of course, NMR methods are not without their 
particular problems and most structural data have been obtained using 
X-ray diffraction. 
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In spite of these difficulties (or perhaps because of them), a suite of 
both chemical and physical methods has been devised to facilitate the 
research community in its quest to gain an in-depth understanding of the 
fibrous proteins. Some of these are theoretical and some experimental. 
Among the former methods are those based on bioinformatics. These 
methods use modeling and pattern recognition techniques to identify 
sequence and structural motifs previously discovered and characterized in 
detail. These motifs are often short in length and repeated consecutively 
many times. Indeed, these two features are characteristic of fibrous pro- 
teins (Chapter 2). Repeats are not confined to fibrous proteins and many 
are found in globular proteins (though these are usually quite long in 
comparison to those seen in the fibrous proteins). Consequently, they lead 
to domains of reasonable size (perhaps 20-50 residues but sometimes 
much larger). An appreciable number of a-fibrous structures have now 
been solved at atomic resolution (Chapter 3), allowing some of the key 
structural principles to be recognized and incorporated in de novo design 
methods (Chapter 4). Much progress has also been made on specific 
members of the class of a-fibrous structures. These include the two- 
stranded (double-helical) intermediate filament molecules (Chapter 5) 
and some of their associated proteins (Chapter 6), the single-stranded 
(but triple-helical) spectrin/a-actinin/dystrophin molecules (Chapter 7), 
and also the three-stranded (triple-helical) fibrin/fibrinogen molecule 
(Chapter 8). 

Connective tissues, as noted earlier, are composite materials par excel- 
lence. Before the entire tissue can be appreciated in its full glory, it is 
necessary to probe details of the structure, function, and role of each 
component. This has been done here for collagen (Chapter 9), where the 
nature of the underlying triplet structure has been systematically investi- 
gated and crystal structures reported. Chapter 10 deals with collagen 
molecules that are fibril-forming, whereas Chapter 11 concentrates on 
those that form networks. Both forms of aggregation are functionally 
important, although the emphasis mechanically is clearly quite different 
in the two cases. Elastic fibers, defined by an outer palisade of fibrillin- 
containing microfibrils and an inner core of tropoelastin, are described in 
Chapters 12 and 13. These fibers are rarely accorded the importance that 
they merit, but the contributions in this volume will surely do them the 
justice they deserve. 
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ABSTRACT 


The amino acid sequences of increasingly large proteins have been 
determined in recent years, and it has become more and more apparent 
that within these sequences nature has employed only a finite number of 
structural/functional motifs. These may be strung along the sequence in 
tandem and, in some cases, several hundred times. In other instances, the 
positions of the motifs show little obvious order as regards to their relative 
linear arrangement within the sequence. The observed sequence repeats 
have been shown to vary in size over at least two orders of magnitude. It is 
shown here that the repeats can readily be classified on the basis of 
character, and five distinct groups have been identified. The first of these 
(Type A) represents those motifs that are fixed in length and conserved 
absolutely in sequence (>99%); the second (Type B) includes motifs that 
are also fixed in length, but where absolute sequence conservation occurs 
only in some positions of the repeat. The third category (Type C) contains 
fixed length motifs, but the character of only some of the positions in the 
motif is maintained. The fourth group (Type D) includes motifs that have 
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nonintegral lengths. The fifth class (Type E) contains motifs, often dis- 
playing some variations in their lengths even within a single species, which 
maintain a discrete structural form related directly to their function. Ex- 
amples are presented for each category of repeat, and these are drawn 
almost exclusively from the fibrous proteins and those proteins that are 
normally associated with them in vivo. 


I. INTRODUCTION 


Fibrous proteins are often designed specifically to aggregate and form 
filamentous assemblies. Examples of these include fibrous proteins in the 
collagen and a-fibrous classes. Filament formation requires that comple- 
mentary groups in two or more molecules lie in appropriate axial and 
azimuthal orientations to facilitate interaction. For example, the presence 
of periodic clusters of acidic residues in one molecule might lie close to 
periodic clusters of basic residues in another. This could facilitate the 
formation of a network of stabilizing intermolecular ionic interactions and 
thereby specify a unique mode of aggregation. Likewise, patches of apolar 
residues in different molecules might lie in positions where the two areas 
could come together and shield both regions from the aqueous environ- 
ment, thereby stabilizing and specifying the mode of assembly. Comple- 
mentarity of shape and hydrogen-bonding potential are other means used 
in vivo to provide specificity of interaction and assembly. However, in 
order for these possibilities to be realized for filament-favoring molecules, 
two special sequence-related features are required. First and by definition, 
a filamentous structure implies the presence of a relatively short-range 
motif repeated contiguously. Such a motif will adopt a particular confor- 
mation, often one of the well-known elements of secondary structure (a- 
helical, G-strand, or collagen-like). Since such residues or clumps of 
residues would naturally favor similar environments, it is extremely likely 
that the elements will be related to one another helically, and that an 
elongate molecular structure will be formed. Secondly, a regular pattern 
of intermolecular interactions is needed for filamentous assembly to 
occur; this also infers a corresponding regularity in the underlying amino 
acid sequences of the interacting molecules. 

Sequence repeats in proteins vary greatly. Some are very short and 
others are extremely long. The repeats can be exact or approximate, they 
can contain residues that are absolutely conserved in some positions but 
not in others, and they can be extremely imprecise in other than a 
conserved general character. Some repeats are fixed in length and others 
are not. Some are primarily functional and, in an apparent contradic- 
tion in terms, can also vary in length. Some repeats occur many times 
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contiguously while others are found only a small number of times consec- 
utively. Still others are found distributed singly in what appears to be (but 
is clearly not) a random manner along the entire length of the protein 
chain. Many motifs are now well established and these have been recog- 
nized in quite diverse proteins. The PROSITE database gives a summary of 
these motifs (http://www.expasy.org/prosite/) and, as such, provides the 
researcher with a first indication of the repeats that exist in a newly 
determined protein sequence (Hulo et al., 2004; Sigrist et al., 2002). 

In order to bring some order to what is clearly a diverse array of 
observations, this review has attempted to categorize the repeats. Five such 
classes have been recognized. Each will be dealt with in turn and appro- 
priate examples presented. No attempt has been made to list the repeats 
observed in the sequences of all proteins; instead, representative examples 
are presented from the fibrous proteins in particular, and the proteins 
associated with them, in order to illustrate key features of some of the 
more interesting structures. First, however, these classes are defined. 


II. SEQUENCE REGULARITIES 


A. Fixed Length and Exact Sequence Repeats (Type A Repeat) 


Type A repeats contain exact numbers of residues (described here as 
quantal). Exact copies of the repeat are often, though not invariably, strung 
along the sequence contiguously. The extent of the motif varies over several 
orders of magnitude and ranges from two residues in some silks (Parry, 
1979) to over 500 residues in epiplakin (Fujiwara et al., 2001). In the latter 
case, the repeat (occurring five times in tandem) is conserved at the 99.6% 
level. Any repeat conserved at the 99% level or above is considered here to 
be exact. Motifs longer than those in epiplakin will undoubtedly be found, 
and these too will arise from gene duplication events. 


B. Fixed Length Repeats but Residues Absolutely Conserved in Only Some 
Positions of the Motif (Type B Repeat) 


The classic example of a Type B repeat is that presented by the a-chains 
in collagen. Three such chains aggregate to form a triple-helical collagen 
molecule, but they can only do so if glycine is positioned in every third 
residue of the sequence (Hulmes, 1992). This is because glycine is located 
internally and, due to its size, is the only residue that can fit stereochemi- 
cally into the space available. In the Type I collagen a-chain, the repeat 
occurs 338 times contiguously. 
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C. Fixed Length Repeats with Sequence Character Maintained 
(Type C Repeat) 


Type C repeats are very common in proteins. They are quantal in 
length, but the repeats themselves do not contain residues that are con- 
served absolutely in any position. However, several positions within the 
repeats are strongly conserved in character. A classic example of a Type C 
repeat is that given by the heptad substructure in a-fibrous proteins. This 
has the form (a-b-c-d-e-f-g)„n with the a and d positions generally occu- 
pied by apolar residues, and the e and g positions by charged or hydro- 
philic residues. The heptad is characteristic of an a-helical conformation 
(Cohen and Parry, 1986, 1990; Lupas, 1996), but comparison of any two 
sequences with a heptad substructure generally reveals only about 15-20% 
identity. The motif also implies that several a-helices will aggregate to 
form a multistranded left-handed coiled-coil rope to shield the apolar 
stripes on the surface of the a-helices from the aqueous environment. 


D. Nonintegral Repeats in Sequence Character (Type D Repeat) 


For many proteins, it is the geometry of the molecules that is para- 
mount; packing considerations are secondary. Consequently, there may be 
nonquantal periodicities of residues and, more often, differences in resi- 
due types on the surface of the molecule. These are important in specify- 
ing molecular interactions and assembly. Examples of the Type D repeat 
are formed for both the acidic residues and the basic residues in two major 
coiled-coil segments of keratin (McLachlan, 1978; Parry et al., 1977) and 
desmin/vimentin intermediate filament chains (McLachlan and Stewart, 
1982). Provided that the interacting molecules display one of the possible 
relative axial staggers, they will necessarily be stabilized in the filament by 
clusters of intermolecular ionic interactions. Other examples are that seen 
for tropomyosin, which has a regular distribution of acidic residues and, to 
a lesser extent, apolar residues (McLachlan and Stewart, 1976; Parry, 
1975) that match the period exhibited by the actin monomers in the thin 
filaments of muscle. This allows each actin monomer to be regulated by 
tropomyosin in a similar manner. 


E. Sequence Motifs of Variable Length in Proteins (Type E Repeat) 


A sequence repeat found in both fibrous and globular proteins is one in 
which a functional motif occurs, possibly several times in the sequence but 
often noncontiguously. This Type E repeat is exemplified by the Ca?* EF 
hand, which is a length of sequence specifying a pair of a-helices that 
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define a site capable of binding a Ca?" (or sometimes a Mg”) ion. Some 
residues within this repeat are largely, but not absolutely, conserved while 
other positions show a wide range of tolerance to different residues. The 
absolute lengths of some Type E repeats vary, with small deletions or 
insertions possible without compromising function. 


Ill. EFFECT OF SEQUENCE REPEATS ON SECONDARY STRUCTURE, 
AND IMPLICATIONS FOR TERTIARY AND QUATERNARY STRUCTURE 


A. Type A Repeats 


Short exact repeats of sequence generally adopt a well-defined element 
of secondary structure. If these repeats in the protein are arranged in 
tandem, then a helical arrangement is likely since this will result in similar 
spatial environments for each motif. A fibrous structure with a high axial 
ratio is the natural result and the repeat will primarily be a structural one. 
In the situations where the exact repeats are not consecutive, it is more 
probable that the motifs will be recognition or interaction sites, and hence 
have more of a functional than structural significance. On the other hand, 
long exact repeats lacking internal sequence substructure are likely to 
form discrete domain structures (possibly globular). Thus, the longer the 
repeat length, the more likely it is that the protein can be represented by a 
string of globular domains. It is unlikely in these circumstances that the 
motifs will be related to one another helically. 

Silks that contain short sequence repeats, often incorporating only a 
small subset of the available amino acids, seem relatively common. For 
example, Pachylota audouinii contains equal amounts of alanine and gluta- 
mine residues, and this silk is thus believed to have a (A-Q),, substructure 
(Lucas and Rudall, 1968). Since Pachylota has a (-structure and the repeat 
has an even number of residues, the resulting (-sheets will have one face 
comprised entirely of alanine residues and the other entirely of glutamine 
residues. -sheets in proteins commonly have a right-handed twist, and 
aggregation with a second ß-sheet of similar characteristics is common 
in vivo. In such cases, the apolar faces of two interacting (-sheets De 
those composed solely of alanine residues) would be most likely to come 
together. In so doing, the apolar residues would be shielded from the 
aqueous milieu and the assembly would gain stability. A repeat of this type 
would thus have structural implications rather than functional ones. 

There are multiple phosphorylation sites in the tail domain of the medi- 
um and high molecular weight chain of neurofilaments (NF-H). In the 
majority of cases, the phosphorylation sites are serine residues within the 
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repeating triplet motif of exact sequence K-S-P. None of these are contigu- 
ously arranged, though pairs ofthem are found within a 14-residue repeat of 
the form K-S-P-E-K-A-K-S-P-V-K-E-A-A (http://www.interfil.org/). 
This is conserved at the 89% level (i.e., a Type C repeat) and occurs 
nine times in tandem. The K-S-P repeat (Type A), which is primarily a 
functional one, represents a site of kinase action. This has significant 
implications in vivo. Indeed, phosphorylation/dephosphorylation events 
are important not only in neurofilaments, but also in intermediate fila- 
ments in general. In the former case, the state of phosphorylation seems to 
relate directly to axonal diameter, conduction velocity, and transport 
properties. 

An example of a longer exact repeat is found in chick scale keratin. The 
sequence contains a fourfold tandem repeat, each 13 residues long (G-Y- 
G-6-5-S-L-G-Y-G-G-L-Y; Fig. 1). Interestingly, feather and scale 


keratin share a common microfibril structure with a 3.4 nm diameter. 


Fic. 1. The fundamental difference between the sequences of scale and feather 
keratin is the presence of four consecutive repeats of a 13-residue motif in the former 
but not in the latter. Gregg et al. (1984) have proposed that this 52-residue sequence 
forms an eight-stranded antiparallel (-sheet, likely to have a right-handed twist. A 
schematic of the conformation illustrates the confinement of glycine residues (in 
yellow) to the -turns and the serine residues (in blue) to the /-strands. A tyrosine 
residue (in red) occurs at each (turn and in alternate /-strands; these may be involved 
in forming strong apolar interactions with other tyrosines in similar sheets, thereby 
providing the scale keratin microfibrils with greater lateral organization, as observed 
experimentally. Figure redrawn from the original of Gregg et al. (1984). 
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However, the packing of these elements in scale keratin is considerably 
more regular. This has been associated with the four 13-residue repeats, 
since the sequences of the two proteins differ only in their inclusion in 
scale keratin and their absence in feather keratin. According to Gregg et al. 
(1984), the 52-residue region adopts an eight-stranded antiparallel -sheet 
structure with the tyrosine residues located at the turns. Tyrosines are 
often considered to be “‘sticky’’ residues since they frequently form strong 
hydrophobic interactions. It can be speculated, therefore, that the more 
regular packing in scale keratin is specified in large part by interactions 
between the /-sheets from different molecules. The sequence evidence 
supports the concept that scale and feather keratin shared a common 
ancestor. As scales preceded feathers chronologically, the feathers would 
have evolved from scales through the deletion of the tandem repeat 
region. 

None of the very long sequence repeats found in proteins—perhaps 
exemplified by the fivefold tandem 534-residue repeat in epiplakin 
(Fujiwara et al., 2001)—has a structure that has been determined experi- 
mentally by either X-ray on nuclear magnetic resonance (NMR) means. 
Nonetheless, it is possible to infer that sequences over 50 residues in 
length (let alone those more than 500 residues long) will form a structural 
domain (or perhaps several domains) to yield a structure that can be 
likened to a string of beads. This form automatically provides a protein 
with multiple opportunities to interact similarly with other proteins or with 
other biological macromolecules. Profilaggrin, a polyprotein precursor to 
filaggrin, is composed extensively of Type C repeats and can be likened 
somewhat to epiplakin in this regard (Gan et al., 1990; Rothnagel and 
Steinert, 1990). Human profilaggrin contains 10-12 tandem repeats, each 
312 residues long, and these are joined by short highly conserved linkers. 
Mouse profilaggrin, however, has even more repeats (18 to > 30) and two 
distinct types exist (250 and 255 residues in length). It must be empha- 
sized here, though, that profilaggrin is a precursor protein and not a 
functional one. Furthermore, its repeats are of the Type C, rather than the 
Type A, variety. 


B. Type B Repeats 


As noted earlier, the structural motif in Type I collagen a-chains is of 
the form (G-X-Y),, where X and Y can be almost any amino or imino acid 
residue and n is 338. This triplet substructure currently represents one of 
the best examples of a Type B repeat (i.e., a repeat in which one or more 
residues in a motif is maintained absolutely for structural or functional 
reasons). Some residues in the X or Y positions of the collagen a-chains, 
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however, show a preference for one position or the other; this may arise 
either from stereochemical limitations (see, for example, the preference 
for leucine, phenylalanine, and glutamic acid in the X positions, and for 
glutamine, arginine, and lysine in the Y positions: Fietzek and Kühn, 1975; 
Salem and Traub, 1975) or from posttranslational modifications (see, for 
example, prolines in the X positions, but hydroxyprolines in the Y posi- 
tions). Each a-chain has a left-handed conformation closely akin to that 
seen in polyglycine II (Rich and Crick, 1955). Jn vivo, three such chains 
aggregate in parallel to form a three-stranded right-handed coiled-coil 
molecule with 10 residues in three turns (Fig. 2; Fraser et al., 1979, 1983; 
Ramachandran and Kartha, 1955; Rich and Crick, 1955, 1961). 

The structural roles of the glycine residues on the one hand, and the 
residues in the X and Y positions on the other, are quite distinct. The 
glycines are a major determinant of the geometry of the triple helix and 
are internally located close to the axis of the molecule. Their small size 
makes them the only residue capable of fitting in to the available space 
without causing a major distortion to the structure of the triple helix. 


GT S 
a BS séi ` 

Fic. 2. (A) The molecular structure of collagen contains three left-handed helical a- 
chains that coil about one another in a right-handed manner. As a consequence of the 
glycine-based triplet substructure in the collagen sequence, the glycine residues in all 
three chains lie close to the axis of the molecule, and are the only residues able to fit 
into the central space available due to their size. The model was produced by Fraser et al. 
(1979) using a linked-atoms least-squares technique refined against quantitative X-ray 
diffraction data. (B) Space-filling version of the collagen structure illustrated in (A). 
Figure from Fraser et al. (1987) with permission from Academic Press. 
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In contrast, the X and Yresidues lie totally on the surface of the molecule, 
where they may interact readily with other molecules and hence specify 
the mode of molecular aggregation. The silk from the gooseberry sawfly 
Nematus ribesü gives a collagen-type X-ray diffraction pattern, which is 
strongly indicative that a triplet substructure also exists in its amino acid 
sequence. 


C. Type C Repeats 


A very large number of Type C repeats have been recorded. Examples of 
these include: human neurofilament chain (NF-H) with 89% identity 
among the nine contiguous repeats in a 14-residue consensus sequence 
K-S-P-E-K-A-K-S-P-V-K-E-E-A (Lee et al., 1988); human ribosome 
receptor with 87% sequence identity among the 54 contiguous repeats 
in a 10-residue consensus sequence N-Q-G-K-K-A-E-G-A-Q (Langley 
et ol, 1998); desmoyokin, a keratinocyte plasma membrane-associated 
protein, with 78% identity among the ten contiguous repeats in a 128- 
residue consensus sequence (Hashimoto et al., 1993); giardia from median 
body with 47% identity among the 11 contiguous repeats in a 24-residue 
consensus sequence (Marshall and Holberton, 1993); and human erythro- 
cyte a-spectrin with 21% identity among 20 contiguous repeats in a 
106-residue consensus sequence (Sahr et al., 1990). 

A good example of a significant species difference is given by the repeats 
in the tail domain of nestin (an intermediate filament from neuronal 
cells). In hamsters, the sequence displays 18 consecutive repeats, with each 
being 44 residues in length. Human nestin, in contrast, was shown to have 
22 residue repeats with only 14 in tandem. The sequences of both repeats, 
nonetheless, showed a close relationship (Steinert ef al., 1999b). The 
22-residue repeat has an underlying quasi-repeat of just 11 residues 
(consensus sequence K/E-E-D/N-Q-E-X-L-R/K-X-L-E). 

The seven-residue heptad repeat is one of nature’s most frequently used 
oligomerization motifs. Each repeat has the form (a—b-c-d-e—fg) ,, where 
positions a and dare generally filled by apolar residues (occupancy rate is 
approximately 75% in both positions). The chains adopt an a-helical 
conformation with about 3.6 residues per turn; since the apolar residues 
are, on average, 3.5 residues apart, they form a left-handed stripe that 
winds around the axis of the righthanded a-helix. When two or more 
a-helices aggregate, the apolar residues become internalized in a knob- 
hole form of close packing and are shielded from the water. The coiled- 
coil structures thus formed display a range of pitch lengths. Two-stranded 
GCN4, for example, has a pitch length of about 20.4 nm (Fig. 3; Harbury 
et al., 1993; Kühnel et al., 2004), but three- and four-stranded coiled coils 
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Fic. 3. The structure of the 33-residue region of yeast transcription factor GCN4 is a 
two-stranded coiled coil, and is viewed here perpendicular to its long axis. The chains 
each have a heptad substructure and an a-helical conformation. Because GCN4 
contains leucine residues in each d position, except for the most C-terminal one, the 
structure is commonly referred to as a leucine zipper (PDB coordinate reference 
number 2ZTA). The pitch length of the left-handed coiled coil has an average value of 
about 20.4 nm (Harbury et al., 1993; Kühnel et al., 2004). 


generally display somewhat greater values (see methodology of Strelkov 
and Burkhard, 2002). It is worth reiterating that the quantal repeat (seven 
residues) does not depend structurally or functionally on the need to 
have conserved residues in any particular position. The character, how- 
ever, is strongly maintained through the presence of apolar residues in 
positions a and d, and by the frequent occurrence of charged residues 
(DEKR) in positions e and e The latter play an important role in forming 
interchain ionic interactions that specify both the relative chain stagger 
and the relative polarity of those chains. The heptad repeat is a very 
common feature of proteins, most specifically the a-fibrous class. This 
includes the muscle proteins myosin and tropomyosin, the intermediate 
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filament proteins, and the plakin superfamily of intermediate filament 
associated proteins and fibrinogen, among many others. 

Four-stranded coiled coils, both with 11-residue repeats, have been 
recognized in tetrabrachion (residues 19-52; Ozbek et al., 2004; Stetefeld 
et al., 2000) and RH4 protein (Burkhard et al., 2001). These repeats are 
equivalent to two heptads with a stutter (i.e., there is a three-residue 
deletion in an otherwise continuous heptad substructure; Brown et al., 
1996). On average, the apolar residues are 11/3, or 3.67 residues apart. As 
this value is slightly greater than the average number of residues per turn 
in an a-helix, the coiled-coil structure will have a slight right-handed twist; 
pitch lengths of 128 and 84 nm have been recorded (Kühnel et al., 2004). 
However, a short length of sequence at the N-terminal end of the tetra- 
brachion chain (residues 4-18) has a four-residue insert and this, in effect, 
leads to a 15-residue repeat. This is equivalent to a pair of heptads plus a 
skip residue. A skip is conformationally equivalent to a pair of stutters. In 
this instance, the average separation of apolar residues is 15/4, or 3.75 
residues. This is considerably larger than the average number of residues 
per turn in an average a-helix, and the right-handed pitch length for 
this segment is consequently much shorter (32.2 nm) than those observed 
for structures with 11-residue repeats. Further support for this conclusion 
comes from the work of Kühnel et al. (2004), who have recently solved the 
structure of human vasodilator-stimulated phosphoprotein (VASP). This 
contains a pair of 15-residue repeats with the right-handed pitch length for 
this structure calculated to be 18.5 nm, a value directly comparable to that 
seen in two-stranded left-handed coiled coils (Fig. 4). 

Another example of a quantal repeat—but with considerable variation 
in sequence—is seen in the keratin-associated proteins (KAPs). In sheep, 
these display pentapeptide and decapeptide consensus repeats of the form 
C-C-Q-P-S/T and C-C-Q/R-P-S/T-C/S/T-C-Q-P/T-S, respectively 
(Parry et al., 1979). Some of the positions, as indicated by the presence 
of a consensus sequence, contain residues that occur much more fre- 
quently than others, but the absolute conservation of a residue in any 
position is not observed. The decapeptide consists of a pair of five-residue 
repeats closely related, but different to that displayed by the pentapeptide. 
Although the repeats have an undetermined structure, the similarity of the 
repeat to a sequence in snake neurotoxin suggests that the pentapeptides 
will adopt a closed loop conformation stabilized by a disulphide bond 
between cysteine residues four apart (Fig. 5: Fraser et al., 1988; Parry et al., 
1979). Relative freedom of rotation about the single bond connecting 
disulphide-bonded knots would give rise to the concept of a linear array of 
knots that can fold up to form a variety of tertiary structures. The KAPS 
display imperfect disulphide stabilization of knots and have interacting 
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Fic. 4. The structure of human vasodilator-stimulated phosphoprotein (VASP) is 
that of a parallel chain, four-stranded coiled-coil (Kühnel et al., 2004). The sequence 
contains a pair of 15-residue repeats, resulting in the formation of a right-handed coiled 
coil with a pitch length of 18.5 nm, a value directly comparable to that seen in left- 
handed two-stranded coiled coils. Figure courtesy of Sergei Strelkov. 


sites that differ subtly from one another, presumably for functional 
reasons. 

SH3 domains are used extensively by cytoskeletal and signaling proteins 
to mediate protein-protein interactions, and they do so through a proline- 
rich motif. This has a consensus sequence P-X-X-P. Another motif—the 
WW domain—also facilitates protein-protein interactions, and it too is 
based on proline. Its consensus sequence is P-P-X-Y (a Type I WW repeat 
as identified in the extracellular matrix receptor (-dystroglycan) and this 
interacts with a WW domain in dystrophin or utrophin (Ilsley et al., 2002; 
Winder, 2001). These two motifs are interesting, not only because they are 
short and proline-rich, but because they are able to impose considerable 
specificity of interaction on the proteins involved. 
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Fic. 5. Predicted conformation of the pentapeptide repeat C-X-Y-Z-C in trichocyte 
keratin-associated proteins. Glutamine and arginine residues are found commonly in 
the X position, prolines in the Y position, and serines and threonines in the Z position. 
The structure is based on the known conformation of a similar repeat in snake 
neurotoxin. The model shows a disulphide bond-stabilized -bend with a potential 
hydrogen bond (dotted). A string of these (-bends, linked by bonds about which there 
is relatively free rotation, has been proposed as a model for this important family of 
matrix proteins in trichocyte keratin (Fraser et al., 1988). Figure from Fraser et al. (1988) 
with permission from Elsevier. 


D. Type D Repeats 


The structure of intermediate filament protein molecules is a tripartite 
one with globular head and tail domains separated by a rodlike structure 
of typical length 46 nm. The latter consists of four a-helical heptad- 
containing coiled-coil segments (1A, 1B, 2A, 2B) connected by linkers 
LI, L12, and L2 respectively (http://www.interfil.org: Parry and Steinert, 
1995). The 1B and 2A + 2B regions both have regular linear distributions 
of their acidic and basic residues, thus giving rise to alternating acidic and 
basic residue banding. The observed periods are 9.54 residues (1.42 nm) 
and 9.84 residues (1.46 nm), respectively. On the basis of maximizing 
intermolecular ionic interactions, five possible modes of molecular aggre- 
gation were proposed by Crewther et al. (1983). Three of these modes 
involved antiparallel alignments and were substantiated experimentally by 
crosslinking studies (Steinert et al., 1993a,b,c, 1999a). They arose from an 
alignment in different molecules of: (1) the 1B segments (the Aj; mode); 
(2) the 2B segments (the Ass mode); and (3) the 1B and 2B segments (the 
Aua mode) (Fig. 6). 
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Fic. 6. The longest coiled-coil segments in an intermediate filament chain are 
known as segments 1B and 2B. Both display a regular distribution of their acidic and 
their basic residues (period in both charged types is about 9.54 residues in segment 1B, 
and about 9.84 residues in segment 2B). The bands of opposite charge in these 
segments are indicated by the red and the white stripes. Alignment of molecules to 
maximize potential intermolecular ionic interactions can be achieved with either 
parallel or antiparallel arrangements, though crosslinking data have shown the 
presence only of the latter. The red bands lie opposite the white bands to indicate 
one of a family of possible axial staggers that maximize the potential of forming 
intermolecular ionic interactions. Antiparallel alignment occurs between (A) two 1B 
segments (A,; mode), (B) two 2B segments (Ass), and (C) entire molecules (A12). (D) 
A fourth mode of assembly (Acn) involves a small head-to-tail overlap (typically about 
seven residues) between similarly directed molecules. 


Tropomyosin is a two-stranded, a-helical coiled-coil molecule that ag- 
gregates head-to-tail with others to form long filamentous ropes. These 
lie in each of the two long period grooves of the actin microfilaments 
where, in vertebrate skeletal muscle, they play an important part in the 
Ca” '-mediated regulation of actin via troponin (a tropomyosin-associated 
protein). An important feature of tropomyosin is its 39.2-residue period— 
that is also quasi-halved (19.6 residues)—in the linear distribution of the 
acidic residues and, to a lesser extent, the apolar residues (McLachlan and 
Stewart, 1976; Parry, 1975). The number of residues in tropomyosin (284 
residues), and the head-to-tail overlap (nine residues) that allows axial 
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aggregation to occur, generate an effective repeat length of 275 residues. 
This is almost exactly equal to seven times the observed sequence repeat 
(7 x 39.2 = 274.4 residues). In other words, a repeat length of tropomyo- 
sin gives rise to a sevenfold copy of a pair of segments with similar 
sequences (the so-called a- and (sites). Just as importantly, the 39.2- 
residue period in tropomyosin (39.2 x 0.1485 = 5.82 nm) corresponds 
exactly to the separation of the actin monomers in the thin filament. This 
means that each actin is capable of being regulated by tropomyosin in the 
same manner (Fig. 7; McLachlan and Stewart, 1976; Parry, 1975; Parry and 
Squire, 1973; Phillips et al., 1986). This is perhaps a classic example of how 
proteins with completely different helical symmetries are able to assemble 
and interact (and function) in a highly specific manner. When the muscle 
is relaxed, the site on actin that interacts with the myosin head is blocked 
by the a-site of tropomyosin, and myosin-ATPase activity is not activated. 
However, when the muscle is activated (through a Ca** release mecha- 
nism), tropomyosin is believed to roll across the surface of the thin 
filament and thus allow the Gates to interact with actin. Two possible 
substrates here, however, have been suggested. One of these allows weak 
binding of myosin heads to the thin filaments and has low ATPase activity, 
and the other is a fully active site that allows strong binding of myosin to 
the thin filaments and gives the greatest ATPase activity (see, for example, 
Brown et al., 2001). Further details of the regulation of vertebrate skeletal 
muscle by tropomyosin are given by Squire and Morris (1998). 

The a-fibrous proteins myosin (http://motility.york.ac.uk/myosins. 
shtml) and paramyosin in muscle have structural similarities to one an- 
other, with each having a heptad substructure and a resulting long coiled- 
coil rod domain (about 163 and 122 nm, respectively). Jn vivo, myosin and 
paramyosin coassemble in such a manner that paramyosin forms the core 
of the thick filaments and myosin lies on their surfaces (Cohen and Parry, 
1998). In order for this to happen, there must be complementary inter- 
actions between the molecules; indeed, there are periodic distributions of 
acidic and basic residues in both myosin and paramyosin. The periods are 
identical in both proteins for both residue groupings, and in each case 
these are equivalent to about 9.4 residues. The fundamental period, 
however, lies close to 28 residues and arises from the beating together 
of the 28/3 and heptad repeats (Cohen et al., 1987; Kagawa et al., 1989; 
Mclachlan and Karn, 1983; Parry, 1981). Consequently, appropriate align- 
ment of the acidic and basic residue clumps in the two molecular species 
(i.e., relative axial staggers of 28 [n + 0.5] residues) allows the formation 
of many stabilizing intermolecular ionic interactions. This is compatible 
with the periods seen experimentally in muscle filaments using both X-ray 
diffraction and electron microscope techniques. 
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tropomyosin molecule 
C showing two chains 


Fıc. 7. (A) Tropomyosin contains two right-handed a-helical chains (in blue and 
yellow) that are parallel and in axial register. These chains coil around one another in a 
left-handed manner typical of a coiled-coil molecule. In turn, head-to-tail assemblies of 
parallel tropomosin molecules follow the long period right-handed grooves in the thin 
filament. Each tropomyosin molecule has 14 zones with an acidic and an apolar part 
(McLachlan and Stewart, 1976; Parry, 1975). These are represented by horizontal black 
bars. For a supercoil pitch length of 13.7 nm, tropomyosin will make seven half turns 
relative to the seven actin monomers with which it makes contact. Alternate zones will 
have identical azimuthal aspects with regard to each actin, thereby allowing each actin 
to be regulated by tropomyosin in an identical or quasi-identical manner. If 
tropomyosin rolls across the actin surface by about 90 degrees, then the second set 
of zones will lie in roughly equivalent azimuthal positions with respect to the first set 
prior to the rolling motion. The relaxed state of muscle is thus stabilized by one set of 
seven tropomyosin-actin interactions and the active state by the second set. (B) Cross- 
section of the thin filament in relaxed vertebrate skeletal muscle illustrating the 
positions of the head-to-tail arrays of tropomyosin molecules on the surface of each of 
the two long-period actin strands. The model was determined using X-ray diffraction 
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E. Type E Repeats 


Type E repeats can have a structural and/or a functional role in vivo, 
and examples of both are given below. Although in many cases the repeats 
tolerate variation in length, the conformational character is maintained. 
This enables the motif to aggregate appropriately or function as required. 
Most of the currently observed Type E repeats lie within the size range of 
20-50 residues, but there is no reason to believe that this limitation is 
mandatory; examples lying outside this range will probably be found. 

Calcium-binding proteins are important in intracellular Gar signal 
transduction, in the modulation of Ca?" signals and in Ca?" homeostasis. 
The calcium binding sites, which are known as EF hands (http://structbio. 
vanderbilt.edu/chazin/cabp_database), have a consensus sequence (in 
calmodulin) of the form E-h-X-X-h-h-X-X-h-D-X-D-G-D-G-X-I- 
D-X-E/D-E-h-X-X-h-h-X-X-h, where residues 1-9 and 22-29 adopt 
an a-helical conformation and residues 10-21 form a turn that acts as the 
Ca?t binding site (Fig. 8). The symbol A refers to an apolar residue. Small 
variations from this consensus do occur in some of the numerous Ca" 
binding proteins that have been characterized to date. In particular, some 
of the EF hands contain a small deletion or insertion with respect to the 
consensus, thus giving rise to a motif with a length that is a little different 
from the idealized 29-residue motif noted above. Generally, the EF hands 
occur in pairs, and these are connected by a short linker. This feature, 
however, is not well preserved in terms of sequence. Interestingly, the first 
EF hand in a pair may form one of two types of Ca? ` binding loops. In the 
first, it is 12 residues long, calmodulin-like, and chelates calcium primarily 
by interactions with side-chain carboxylates. In the second, it is 14 residues 
long, S100-like and chelates calcium mainly through interactions with 
backbone carbonyls. In EF hands, there is an appreciable conformational 
change between the state when Ca?" is bound and the one in which it is 
not (Fig. 8). In particular, in the former state, a significant hydrophobic 
surface is exposed, and this allows interaction to occur with a peptide, 


data (Parry and Squire, 1973). Tropomyosin blocks the normal site of interaction of the 
head of myosin (S-1) on the thin filament. As S-1 is not able to bind actin, it is shown as 
dotted. The two parallel in-register a-helical strands of tropomyosin (in yellow and 
blue) are shown (arbitrarily) to lie in a flat-on position on actin. Troponin is illustrated 
as a red circle. There is no significance in this figure as to the actual site on tropomyosin 
at which troponin is shown as binding. (C) As in (B), except that the muscle is now 
shown in a contracting state, wherein the tropomyosin strands have rolled over the actin 
surface, possibly by about 90 degrees, to allow the second set of interacting sites on 
tropomyosin, roughly 2.9 nm distant along the chain, to interact with actin once again 
in a flat-on position. 
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closed = 1CFC 
open = ICLL 
semi-open = 1WDC 


Fic. 8. The EF hand represents a widespread Cal" binding motif that was first 
characterized in calmodulin. Generally, EF hands are observed in pairs. Each consists of 
two short stretches of a-helix connected by a loop. Two possible sets of ligands have 
been identified through which calcium cations bind to the residues comprising the 
loop (see text). This figure illustrates the structural forms observed that relate to 
different Ca?’ binding states (PDB: closed, 1CFC; open, 1CLL; semi-open, 1WDC). 
Figure from http://structbio.vanderbilt.edu.chazin/cabp_database/seq. 


often helical (Broderick and Winder, 2002). The Ca" binding motif is a 
good example of a functional Type E repeat. 

The term zinc finger was originally used to describe the multiple zinc- 
binding motifs with DNA-binding properties in the Xenopus transcription 
factor IIIA. Nowadays, however, it is used more widely to specify a family of 
structurally diverse zinc-stabilized compact domains (http://prodata. 
swmed.edu/zndb). The classic zinc finger, as initially described, was up 
to 28 residues in length and had a consensus sequence of the form h-X- 
C-Xo9_5—C-Xs—h-X5—h-Xo—H—Xo_;-H. Although this motif, like all Type E 
repeats, displays variability in length, the cysteine and histidine residues 
remain regularly placed to act as zinc ligands. Two of these ligands were 
found from residues in the N-terminal /-hairpin structure; the other two 
were found in the C-terminally-located a-helix. The sidechains of residues 
that lay close to the N-terminal end of the a-helical strand were shown to 
be requisites for binding to the major groove of DNA. It is important to 
emphasize here, however, that at least eight separate zinc finger structures 
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are now known, each with its own distinct sequence characteristics and 
conformation, and the various structural forms are described in detail by 
Krishna ei al. (2003). The number of zinc fingers per protein varies greatly 
too, with as few as 2 to as many as 37 (Klug and Schwabe, 1995). 
Leucine-rich repeats represent binding motifs found in a wide variety of 
both plant and mammalian proteins (Kobe and Kajava, 2001). These are 
involved in a multitude of protein-protein interactions. The sequence of 
porcine ribonuclease inhibitor, for example, displays a leucine-rich repeat 
(LRR) of length 27-29 residues that occurs 15 times in tandem (Fig. 9). 
Likewise, the family of small leucine-rich proteoglycans that includes 
biglycan, decorin, epiphycan, fibromodulin, keratocan, and lumican 


Fic. 9. The conformation adopted by a leucine-rich repeat (LRR) is that of a 
B-strand followed by an a-helix. In porcine ribonuclease inhibitor, a -strand (residues 
2-8) is connected to an a-helix (residues 14-27) by a connecting loop (residues 9-13). 
A horseshoe-shaped structure is formed and is exemplified in the crystal structure of 
ribonuclease inhibitor (PDB 1A4Y: Kobe and Deisenhofer, 1993). This has an inner 
concave surface formed by curved (-sheets and an outer convex surface formed by 
a-helices. The leucines and other large apolar residues form the hydrophobic core of 
the structure. 
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has a central region in which 8-10 LRRs occur in tandem (lozzo and 
Murdoch, 1996). The LRR repeat length can vary somewhat in length and 
is, for example, a little shorter in decorin (24 residues) than in ribonucle- 
ase inhibitor, though it too occurs many times (10 times) in tandem. In 
other LRR proteins, repeat lengths between 20 and 26 residues have been 
observed. However, the key feature related to the regular disposition of 
the leucine residues is maintained. The consensus sequence thus has the 
general form X-L-X-X-L-X-L-X-X-N/C-X-L-X-X-X-X-X-X-X-L- 
X-X-X-L-X-X-X, but the first part L-X-X-L-X-L-X-X-N/C-X-L re- 
presents the “‘signature’’ of the LRR. The conformation adopted by the 
motif, irrespective of its exact length, is that of a (strand followed by 
an a-helix. In porcine ribonuclease inhibitor, there are two alternating 
repeats, but in both cases the -strand (residues 2-8) is connected to the 
a-helix (residues 14-27) by a connecting loop (residues 9-13). The horse- 
shoe-shaped structure thus formed and exemplified in the crystal structure 
of ribonuclease inhibitor (Kobe and Deisenhofer, 1993) has an inner 
concave surface formed by curved -sheets and an outer convex surface 
formed by a-helices. The leucines and other large apolar residues form 
the hydrophobic core of the structure (Fig. 9). 

HEAT-like repeats (http://www.embl-heidelberg.de/~andrade/papers/ 
rep/search.html) have been recognized in two proteasome-binding 
proteins—PA200 and Ecm29 (Kajava et al., 2004). Much of the sequences of 
these two proteins can be accounted for in terms of multiple repeats, ofwhich 
there are about 29 in Pa200 (each about 38-50 residues long) and about 18 in 
Ecm29 (where the motif is similar in length). These repeats are degenerate 
and difficult to recognize but, nonetheless, retain their structural identity. In 
spite of the degeneracy, the character of the repeat in terms of the type of 
amino acids present in most positions is maintained. The HEAT-like repeats 
seen here can, in many respects, be considered as an extreme case of "poor" 
definition in what is normally understood by the term sequence repeat. Structur- 
ally, the motif will adopt an a-helical solenoid conformation, as deduced from 
knowledge of the structure of 11SREG (a proteasome regulator), a related 
protein with many HEAT repeats (Whitby et al., 2000). 


IV. SUMMARY 


Sequence repeats in proteins make use of structural and functional 
motifs that nature has found to work well in vivo. Large numbers of 
these motifs have now been characterized, and any newly determined amino 
acid sequence is generally run through computer databases to assess the 
number and type of repeats present, their possible conformations, and 
their likely functions in vivo. The characteristics of the sequence motifs 
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differ greatly; variations have been distinguished relating to size, the 
degree of internal conservation of the sequence, and whether or not the 
repeats are contiguous or separated from one another. 

This review has concentrated on the fibrous proteins. These are, by 
definition, generally characterized by the presence of regular short-range 
sequence repeats and (usually) a helical arrangement of those structural 
motifs, thereby generating a structure with a high axial ratio. In virtually all 
cases, these elongate molecular structures are interspersed with or termi- 
nated by globular domains. Repeats in these regions have also been noted 
where appropriate. 

In the past, there was a tendency to associate structural roles with the 
rod domains in a fibrous protein, and the functional roles with the 
globular regions. The recent acquisition of considerable quantities of 
structural data available from X-ray diffraction and NMR studies has shown 
that, while this generalization retains some value, the concept should no 
longer be taken too literally. Many exceptions have now been observed. 
For example, the observation that globular regions can play a significant 
structural role in vivo is exemplified by the Hl subdomains in the ‘‘globu- 
lar’’ head domains of epidermal keratin chains. These have been shown to 
stabilize oligomers at the early stages of filament assembly. Likewise, it has 
been demonstrated that rod domains can play an important functional 
role, as seen in the regulatory mechanism of vertebrate skeletal muscle. 
This involves the movement of tropomyosin molecules across the sur- 
face of the actin monomers in the thin filaments, thereby providing the 
mechanism by which actins can all be regulated in a like manner. 

Although there are many more sequence repeats in fibrous proteins 
than have been detailed in this review, it is believed that all of them, 
together with those noted in some detail in the preceding text, can be 
classified into one of only five general types. Each has its own structural 
and functional characteristics, knowledge of which is becoming increas- 
ingly useful as the sequences of ever larger proteins become known. 
Indeed, the bulk of superlong sequences can be accounted for by a linear 
array of well-characterized motifs, of which (in most cases) the three- 
dimensional structures are known. Consequently, it is becoming more 
and more probable that a structure relating to a new protein sequence 
can be determined in terms of an array of motifs of known conformation. 
Protein prediction techniques, allied to molecular dynamic calculations, 
thus have the potential to allow the complete structure to be formulated 
without the need for X-ray or NMR methods. Although the field is not 
yet quite at that stage, it may well become a reality within the next decade 
or so. 
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ABSTRACT 


a-Helical coiled coils are versatile protein domains, supporting a 
wide range of biological functions. Their fold is probably better under- 
stood than that of any other protein; indeed, uniquely among folds, 
their structure can be computed from a set of parametric equations. 
Here, we review the principles of coiled-coil structure, the determinants 
of their folding and stability, and the diversity of structural forms they 
assume. 


I. HISTORICAL INTRODUCTION 


The first investigations into the structure of coiled coils were made by 
William Astbury at the University of Leeds in the 1930s. Astbury had 
worked with Sir William Bragg at the Royal Institution in London in the 
mid 1920s and, at Bragg’s request, had obtained X-ray diffraction patterns 
for wool and silk. After moving to Leeds in 1928, he began applying X-ray 
diffraction systematically to a large spectrum of natural fibers and related 
materials. His main interest was textile fibers, particularly wool in its native 
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(unstretched) and denatured (stretched) forms, since Leeds is located at 
the heart of Britain’s textile industry. He also studied many protein fibers 
from other sources (such as porcupine quills, horse hair, horn, and 
tendons) and discovered the existence of three main spectral forms: an 
a-form exemplified by unstretched wool, a Gomm exemplified by 
stretched wool, and a y-form corresponding to tendon. All three forms 
showed a high degree of fiber regularity and became the targets of 
numerous modeling studies. The a-form was the most common of the 
three spectral forms; Astbury named the class of proteins showing this 
diffraction pattern k-m-e-f, for keratin, myosin, epidermin, and fibrinogen. 
It showed strong meridional arcs at 5.15 A (indicating the repeating unit 
of the structure) and a group of equatorial reflexions at around 10 Aand 
again at 27 A, 

From 1948 on, Linus Pauling at the California Institute of Technology 
and Lawrence Bragg’s group at the Cavendish Laboratory in Cambridge 
became engaged in a race for the correct structural model of the a-form. 
Pauling won in 1950 with the announcement of two hydrogen-bonded 
helical structures (Pauling and Corey, 1950; Pauling et al., 1951), one of 
which was the a-helix (the other structure, a helix with 5.1 residues per 
turn, has never been observed in nature). The a-helix had a rise of 1.5 A 
per residue and 3.6 residues per turn, yielding a periodicity of 5.4 A, not 
5.15 A. Pauling had ignored the meridional arcs at 5.15 A because of the 
diffraction pattern of a synthetic fiber, poly-y-methyl-L-glutamate, which 
was clearly of the a-type but lacked the meridional arcs, instead showing 
relexions at 5.4 A away from the meridian. The a-helix was soon confir- 
med by Perutz (1951), who discovered the 1.5 A diffraction spot required 
by this structure; it had hitherto been missed because of the size of the 
photographic plates and the angle between fiber and X-ray beam used 
for measurements. The prominent meridional arc of k-m-e-f proteins, 
however, remained unexplained. 

It seemed clear that, while the a-helix was the dominant structural 
element in proteins with an a-type diffraction pattern (hence the ‘a’ in 
a-helix), it needed to be further modified to obtain the fiber structure. 
This insight led both Crick and Pauling to consider superhelical distor- 
tions arising from the packing of neighboring a-helices, reports of which 
they submitted independently of each other to Nature at the end of 1952 
(Crick, 1952; Pauling and Corey, 1953). In his description of the a-helix 
(Pauling et al., 1951), Pauling had dismissed the need for the number of 
residues per turn to be an integer or a ratio of small integers, but had 
envisaged the possibility that in a crystalline arrangement, the regular 
packing of the helices could favor deformations into configurations with 
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a rational number of residues per turn, such as 11/3 (11 residues over 3 
turns), 15/4, and 18/5. The idea that packing interactions would lead to 
deformations of the a-helix appears to anticipate the key insight needed 
for the coiled coil, but at this time Pauling was clearly still thinking in 
terms of nonnative, crystalline interactions and not of native interactions 
in a fiber. Conspicuously absent from his list of periodicities is 7/2, the 
signature periodicity of the coiled coil. Two years later, he took up this idea 
again, proposing that the superhelical distortion arising from a-helices 
twisting around each other could account for the meridional arc in the 
diffraction spectrum of k-m-e-f proteins (Pauling and Corey, 1953). He 
envisaged a range of possible sequence periodicities (4/1, 7/2, 15/4), 
leading to supercoils with senses of twist both the same and opposite to 
those of the constituent helices. He also envisaged various stoichiometries 
for the helical bundles (which he called compound helices), including one 
with six helices coiling around a straight seventh one. Unlike his earlier 
paper on the a-helix, this paper did not offer quantitative parameters 
for the model structures. Also, side-chain packing played no role in his 
considerations. Instead, supercoiling resulted from the exact repetition 
of short sequences, which caused periodic fluctuations in backbone 
hydrogen-bond lengths. 

In contrast, Crick’s communication in Nature and two subsequent 
papers in Acta Crystallographica offered a detailed and fully parameterized 
model for a sequence periodicity of 7/2 (Crick, 1952, 1953a,b). He 
showed that, if a-helices were to twist around each other at an angle of 
about 20 degrees, their side chains would interlock systematically along 
the core of the structure, repeating the same interactions every 7 residues 
(or two turns of the a-helix). The close packing interactions of hydropho- 
bic residues in the core would provide the energy required to distort the 
helices, a remarkable insight at a time when the biophysics of protein 
folding were unknown and even the exact sequence of a protein still 
remained to be determined. Crick called the bundle of supercoiled 
helices a coiled coil and referred to the regular packing of side chains as 
“knobs” into “‘holes.’’ The coiled coils would contain two or three parallel 
a-helices, supercoiled in the opposite sense of the helices. (Note that since 
Pauling’s initial a-helix model was left-handed, Crick’s coiled coil was right- 
handed, although he also considered bundles of right-handed a-helices 
and of right- and left-handed combinations, stating that the question of 
helical handedness "must still be regarded as open.’’). Crick’s model and 
the diversity of coiled-coil structures determined in the 50 years since, 
some of which surprisingly match Pauling’s conjectures, form the basis of 
this Chapter. 
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Il. STRUCTURAL PARAMETERS 


A. The Standard Model 


Coiled coils are bundles of a-helices that are wound into super- 
helical structures (Fig. 1). Most commonly, they consist of two, three, or 
four helices, running in the same (parallel) or in opposite (antiparallel) 
directions, but structures with five and more helices have been deter- 
mined. They are usually oligomers either of the same (homo) or of 
different chains (hetero), but on occasion consist of consecutive helices 
from the same polypeptide chain, which in that case almost always have an 
antiparallel orientation. 

As proposed by Crick (1952, 1953a,b), the coiled coil’s hallmark is the 
distinctive packing of amino acid side chains in the core of the bundle, 
called knobs-into-holes, in which a residue from one helix (knob) packs into 
a space surrounded by four side chains of the facing helix (hole). This 
geometry contrasts with the more irregular packing of helices in globular 
proteins, often referred to as ridges-into-grooves (Chothia et al., 1977), in 
which a residue packs above or beneath the equivalent residue from the 
facing helix. (See the structure of spectrin in Fig. 2b for an example of 
a protein in which two helices interact via knobs-into-holes, while the third 
is packed via ridges-into-grooves.) Although knobs-into-holes packing is 
often also referred to as in register, this is not strictly true because the 
amino acid side chains do not point straight away from the helix but are 
angled towards the amino-terminus. In antiparallel coiled coils, an optimal 
interaction of side chains is therefore obtained when they point towards 
each other and their C, carbons are out of register. 

The regular meshing of side chains in knobs-into-holes packing requires 
that they occupy periodically equivalent positions along the bundle 
interface. This is not possible with undistorted a-helices, which have 
approximately 3.6 residues per turn (Chothia et al., 1981); the position 
of side chains on their surface drifts continuously. By giving the right- 
handed a-helices a left-handed twist, coiled coils effectively reduce the 
number of residues per turn to 3.5 with respect to the supercoil axis, and 
thus allow the position of side chains to repeat after two turns (or seven 
residues). The residues engaged in knobs-into-holes interactions are usu- 
ally hydrophobic, whereas the outer residues are hydrophilic; as already 
surmised by Crick (1953b), the sequence of coiled coils therefore shows a 
heptad repeat in the chemical nature of side chains. Schematically, the 
seven structural positions are labeled a-g, with a and d denoting the 
hydrophobic residues (Fig. 2c). 
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The orientation of hydrophobic side-chains in the core of coiled coils is 
different for positions a and d (Fig. 2b), as first described by Harbury et al. 
(1993). In a two-stranded, parallel coiled coil, the C,—Cg bond of a 
residue in position a is parallel to the peptide bond facing it in the 
opposing helix; in position d, it is perpendicular. Thus, the distance 
between the Cg, carbons of core residues in position a (a layers) is large, 
favoring -branched side-chains for tight packing (Ile, Val, Thr), whereas 
it is small for core residues in position d (d layers), favoring residues 
unbranched at Cg (Leu, Ala). This situation is exactly reversed in parallel, 
four-stranded coiled coils. In three-stranded coiled coils, the angle is 
intermediate, leading to a packing orientation called acute. In antiparallel 
coiled coils, core packing layers consist of residues in both a and d, thus 
showing mixed geometries (Fig. 2b). 

The idealized structure of the coiled coil has been parameterized by 
Crick (Figs. 2a and 3). The distance required for the superhelix to 
complete a full turn is called the pitch (P), and the angle of a helix 
relative to the superhelical axis is called the pitch angle (a) [also called 


Fic. 3. Schematic representation of a tetrameric coiled coil, showing the main 
parameters. O, marks the center of one a-helix, A, the C, position of a constituent 
residue, and C, the superhelix axis; rı is the a-helix radius, m the superhelix radius, 
a the pitch angle, Q the pairwise helix-crossing angle, the positional orientation angle 
of a residue or phase of the helix, and w is the phase of the supercoil. 
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superhelix crossing angle, x (Harbury et al., 1994) or tilt angle (Offer 
and Sessions, 1995)]. The angle between two neighboring helices is the 
pairwise helix-crossing angle Q; in two-stranded coiled coils, Q equals 2a. 
The vector connecting the center of a helix to the superhelical axis gives 
the superhelix radius (7) and that connecting the center of a helix to the 
Ca carbons of its constituent residues gives the a-helix radius (mn), The 
angle between the 7 vectors for two consecutive residues is the phase shift 
of the a-helix (Ay) and the angle between two consecutive 7% vectors is the 
phase shift of the supercoil (Aw); the angle between the n and 7 vectors 
for the same residue is the positional orientation angle (y; Harbury et al., 
1993) or Crick angle (a; Strelkov and Burkhard, 2002), which gives the 
location of a residue relative to the supercoil axis (the choice of a is 
somewhat unfortunate since a already denotes the pitch angle in Crick’s 
parameterization). The Cartesian coordinates for a coiled coil according 
to Crick are: 


x = N: COSW — D: COSW : COS + n -cosu - gin: sind 
y= + sinw — r - sinw- cosh — 7 : cosa - cosw - sind (1) 
z= P(w/2r) + n - sina - sind 
In transforming the coordinates x, y, z of an ideal a-helix (e.g., 
polyalanine; Arnott and Dover, 1967), these equations can be represented 
for chain 7 of an n-stranded coiled coil as: 
xi = m - cos(272z/P + wi) + x - cos(27z/P + wi) — y- cosa: sin(2nz/P + wi) 
Yi =  - sin(2rz/P + wi) + x- sin(27z/P + wi) + y-cosa-cos(27z/P+w;i) (2) 
zi = H: sina 
where w; = 27(i—1)/n and the pitch angle a is: 
a = arctan(2rn/P) (3) 


Using the formula of Fraser and McRae (1973), the pitch P can be cal- 
culated from the supercoil radius 7, the axial rise per amino acid h (1.495 
A for polyalanine; Arnott and Dover, 1967), and the twist differential At: 


P = (2r/Aı) - [è (all (4) 


where At is derived from the number of residues per turn of an undistort- 
ed a-helix (a = 3.62 for polyalanine; Arnott and Dover, 1967) and the 
periodicity of hydrophobic residues (p = 3.5 in a canonical coiled coil) as: 


At=2r: (l/a-1/p) (5) 


In the structure of coiled coils, the values for pitch and crossing angle 
follow directly from the degree of distortion necessary to reach a periodi- 
cally recurring position for the core residues; Phillips (1992), Seo and 
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Cohen (1993), and Strelkov and Burkhard (2002) have described methods 
for computing the local pitch from coordinates (see also the addendum by 
Zhang and Hermans (1993) and errata in the same issue). Undistorted 
helices have about 3.64 residues per turn (Chothia et al., 1981) and a rise 
of 1.50 Ä per residues; for two-stranded coiled coils, these values yield a 
pitch of approximately 140 A and a helix crossing angle of 22 degrees (Seo 
and Cohen, 1993). An analysis of various crystal structures (Table I) shows 
that individual coiled coils range around these values with comparatively 
minor variations; however, because the pitch is very sensitive to these 
parameters, even sequences that are nearly identical may yield structures 
with substantially different pitch. In the two-, three- and four-stranded 
GCN4 leucine zipper variants, the undistorted helices have 3.65, 3.64, and 
3.60 residues per turn, respectively, resulting in pitches of 135 A, 163 A, 
and 188 A. In addition, because of variations in the sequence and dis- 
continuities in the heptad pattern, the local pitch may vary substantially 
along the length of a coiled coil, occasionally even leading to reversals in 
handedness (Seo and Cohen, 1993; Strelkov and Burkhard, 2002). 


B. Prediction and Analysis Programs 


The strong heptad periodicity of coiled coils and the clear and simple 
parameterization of their structures have made possible a large number 
of computational approaches to their analysis. The earliest sequence- 
based approach was Fourier transform, used initially to detect higher- 
order periodicities superimposed on the basic heptad pattern (see 
e.g., Parry, 1975; McLachlan and Karn, 1983; McLachlan and Stewart, 
1976), but subsequently also for detecting deviations from the heptad 
pattern itself (Hoiczyk et al., 2000; Marshall and Holberton, 1993; Peters 
et al., 1996). In this context, McLachlan developed a Fortran implemen- 
tation, SEQFFTX, based on a derivation presented by McLachlan and 
Stewart (1976). The current availability of this program is not known to 
us. We have implemented a C program for the same purpose (FIWin) 
which, unlike SEQFFTX, uses a user-determined scanning window; 
the program can be downloaded from our web site (http://protevo.eb. 
tuebingen.mpg.de/download). 

A second widely used sequence-based approach relies on matrices of 
residue frequencies, pioneered by Parry (1982). He showed that the 
residue distribution at the seven heptad positions of the putative coiled- 
coil segments of myosin, tropomyosin, a-keratin, and hemaglutinin are 
asymmetric, and proposed a method by which the residue frequencies 
could be used to predict whether a sequence of unknown structure would 
form a coiled coil. This approach was implemented with modifications in 


TABLE I 


Values for the Main Structural Parameters in Coiled Coils with Various Periodicities, Determined Using the Program TWISTER 


Periodicity 


7 


11 


15 


18 


25 


GCN4p1 (2ZTA) 
GCN4pII (1GCM) 
GCN4-pLI (1GCL) 


Tetrabrachion 
(1FE6: 19-48) 

YadA (1P9H) 

Tetrabrachion 
(1FE6: 5-18) 

Hemaglutinin 
pH4 (1HTM) 

Phosphoprotein 
of Sendai Virus 
(1EZJ) 


Number of 
strands 


2 


3 


cc radius 7 (A) 
4.85 
6.76 
7.14 
7.59 


6.54 
7.38 


6.75 


7.82 


cc pitch (A) 
134.5 
163.4 
187.9 

11589.7 


165.0 
197.5 


379.4 


342.9 


Residue 
phase IW 


a: 22.91 
: 32.18 
17.72 
: —31.13 
20.65 
: —28.69 


arapa 


Residues 
per turn 


3.62 
3.60 
3.58 
3.67 


3.70 
3.66 


3.63 


3.63 


Rise per 
residue (A) 


97 


AaaNa9 ANY SVANT 
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the Coils program (http://www.ch.embnet.org/software/COILS_form. 
html); the main modifications concern the substitution of residue pre- 
ferences for frequencies, introduction of a scanning window, and a proce- 
dure to scale scores against reference databases in order to obtain 
probabilities (Lupas, 1996a; Lupas et al., 1991). A variant of this approach, 
using pairwise residue correlations, was developed by Berger and collea- 
gues with the program PairCoil (http://paircoil.lcs.mit.edu/cgi-bin/pair- 
coil; Berger et al., 1995); LearnCoil, a further variant that can be trained 
iteratively on a set of target proteins (Berger and Singh, 1997), is available 
on the Internet for two protein sets, histidine kinases (http://learn- 
coil.lcs.mit. edu/cgi-bin/learncoil; Singh et al., 1998) and viral membrane 
fusion proteins (http://learncoil-vmf.lcs.mit.edu/cgi-bin/vmf; Singh et al., 
1999). We have found, however, that using pairwise residue correlations 
leads to an overfitting of the method to the training set; correspond- 
ingly, PairCoil and its derivatives are substantially less sensitive than 
Coils in detecting proteins unlike those in the training set (i.e., multihelical, 
antiparallel, short, or irregular structures; Lupas, 1996b). The most 
recent approach is based on Hidden Markov models (MARCOIL; http:// 
www.isrec.isb-sib.ch/BCF/Delorenzi/Marcoil/index.html; Delorenzi and 
Speed, 2002) and operates without a scanning window, thus removing 
one of the limitations of Coils and PairCoil. All current methods are 
designed to detect unbroken heptad repeats. 

Matrices of residue frequencies have also been used to discriminate 
between two- and three-stranded coiled coils. Again, the first matrices 
showing a difference in residue preferences were compiled by Parry and 
coworkers (Conway and Parry, 1990, 1991); they observed a clear drop in the 
proportion of charged residues in the hydrophobic core of three-stranded 
coiled coils, a loss of bias for basic residues in a and acidic residues in d, as 
well as a more general loss of positional preferences for most amino acids. 
These observations can be interpreted in terms of the increased size and 
decreased solvation of the hydrophobic core in three-stranded structures, 
as well as in terms of the more uniform (acute) geometry of core residues 
(Fig. 2b). Based on such matrices, Woolfson and Alber (1995) developed 
the program Scorer, and Wolf et al. (1997) developed the program Multi- 
Coil (http://multicoil.lcs.mit.edu/cgi-bin/multicoil). Both appear to oper- 
ate with a fair measure of success; however, we are not aware of any 
benchmark of the two programs on current structure databases. A basic 
problem in their operation results from the fact that they use suboptimal 
detection routines to identify potential coiled-coil regions (Scorer uses 
Coiler, a program that must recognize at least four heptads by their hydro- 
phobic pattern and a minimum of 65% Leu, Ile, Val, and Ala in positions a 
and d to provide an output; MultiCoil uses PairCoil). In addition, the 
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programs cannot tell when the coiled coil under study is neither two- nor 
three-stranded. One basic question that has not been analyzed to date isthe 
influence of the intracellular versus extracellular environment on the 
residue frequencies in the two types of structure. Almost as a rule, long 
oligomeric coiled coils are dimeric if they are found inside the cell and 
trimeric if they are outside. Thus, some of the success in distinguishing the 
two structures may come from overall differences in residue distribution, 
rather than from differential preferences in the heptad positions. 

The main structure-based programs used for the analysis of coiled coils 
are Socket (http://www.biols.susx.ac.uk/Biochem/Woolfson/html/coiled- 
coils/socket/; Walshaw and Woolfson, 2001) and Twister (Strelkov and 
Burkhard, 2002). In addition, Seo and Cohen (1993) have made their 
program for the computation of pitch and handedness available (contact 
http://ccohen@brandeis.edu). Socket is designed to detect knobs-into- 
holes packing in helical bundles, as the most direct way to evaluate the 
compatibility of a structure with the standard model. The program oper- 
ates by representing side chains by their center of mass; they are classified 
as knobs if they contact four or more side-chain centers within a specified 
distance cutoff (set to 7.0 A by default). Incidentally, the program assigns 
an orientation, a register, and the number of constituent helices for each 
detected coiled coil. Most of the examples for coiled-coil structures shown 
in this Chapter were chosen from a scan of the Protein Data Bank (PDB) 
using Socket. The second program, Twister, is designed to list the local 
structural parameters of coiled-coil structures, based on Crick’s parame- 
terization. Twister uses four consecutive C, carbons to determine the 
position of the center for each helix, which then determine the location 
of the supercoil axis. Once the axes are traced, all other parameters follow 
(Strelkov and Burkhard, 2002). Twister is very well suited to track local 
fluctuations in coiled-coil structures, but is limited by the requirement that 
the helices be in a parallel orientation. The parameters in Table I were 
computed using Twister. 

The modeling of coiled-coil structures from sequence-based predictions 
can be performed with BeammotifCC (Offer et al., 2002), which uses 
generalized parametric equations in order to produce coordinates for 
the main chains of coiled coils, even in cases where their sequences differ 
locally or globally from the heptad pattern. Test models of various coiled 
coils from PDB (including GrpE, influenza hemagglutinin at low pH, and 
tetrabrachion) matched the experimentally determined coordinates with 
a deviation of less than 1.1 A in backbone atoms, illustrating the accuracy 
with which the parameterization can replicate the structure of real pro- 
teins. Coiled-coil structures have also been modeled successfully using 
established, template-based modeling techniques (Nilges and Brunger, 
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1991; O’Donoghue and Nilges, 1997), but these are neither more accu- 
rate, nor do they have the computational simplicity and elegance of the 
parametric approach. 


C. Discontinuities 


Although coiled coils are generally very regular structures, only few are 
entirely without discontinuities (Fig. 2c). These can be represented as 
insertions of one or more residues into the heptad pattern. Insertions of 
one residue are called skips, three-residue insertions are stammers, and four- 
residue insertions are stutters. Other insertions may also occur, but are not 
frequent enough to have been named. Insertions can be accommodated 
structurally by perturbations in backbone continuity or in side-chain 
packing. Basically, insertions of three or four residues are close enough 
to a full turn of the helix (3.6 residues) to allow for their accommodation 
within the helical structure; however, they distort the knobs-into-holes 
interactions in the core. By inserting more than 3.5 residues, stutters raise 
the local sequence periodicity, leading to an unwinding of the left-handed 
supercoil; stammers have the opposite effect. Both have the same effect, 
however, on the packing geometry in the core: they alter the relative 
position of residues by changing the degree of supercoiling from the ideal 
value given by the standard model. Schematically, this can be illustrated as 
a rotation of the helices in a helical wheel diagram (Fig. 2c): stutters shift 
residues in position a towards the center of the core, resulting in a 
geometry called an x layer, while they shift residues in d out of the core 
and the following residues towards position a, resulting in a ring of 
interacting residues around a central cavity (a da layer). The situation 
for stammers is analogous, except that residues in position d yield the x 
layers, while the da layers are formed by residues in positions g and a. In 
both cases, the knobs-into-holes packing is transformed locally into a 
knobs-to-knobs interaction. If the insertion occurs close to a core position, 
that layer is the main one being distorted; however, if it occurs between 
core positions, two consecutive layers may be distorted (leading to the 
observation of consecutive x and da layers). By virtue of pointing towards 
the supercoil axis, x layers are constrained in the size of the side chains 
they can accommodate, particularly in the case of two-stranded coiled coils 
where it has been surmised that only alanine or glycine might fit (Lupas, 
1995; see also the alanine-containing x layer in the histone-like protein 
H-NS in Fig. 2c). However, a number of crystal structures illustrate the 
strategies by which two-stranded coiled coils compensate for this steric 
constraint and allow for the presence of other side chains (Fig. 4). The 
CbnR transcription factor accommodates its two x layers (Asp67 and Ser78) 
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Fic. 4. Constrained knobs-to-knobs layers in two-stranded coiled coils. The figure 
illustrates the strategies that coiled coils employ in order to accommodate x layers in 
their structure: antiparallel orientation (CbnR), register shifts (GrpE), and symmetry 
breaks (XRCC4 and ROCK). The constrained core layers are shown in cross-section 
next to their place in the coiled coils. 
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by forming an antiparallel structure; it thus combines a larger and smaller 
side-chain in both x layers and orients them past each other (side chains 
are angled towards the N-terminus and thus point in opposite directions). 
In other proteins, all of them homooligomeric, x layers are accommodated 
by symmetry breaks. In the chaperone cofactor GrpE, the x layers cause 
the two helices to move out of register, even transforming locally the 
knobs-into-holes packing into ridges-into-grooves. In the DNA repair pro- 
tein XRCC4, the x layer causes a change of direction of one helix relative 
to the other. This effect is even more pronounced in the Rho-binding 
domain of ROCK, in which the helices of two homologs (human and 
bovine) take four different paths in order to accommodate two x layers, 
going so far as to switch from left-handed to right-handed supercoil- 
ing and back (this is also observable, to a lesser extent, in CbnR). The 
location of the x layers is readily seen from the changes in direction of the 
helices. 

Insertions other than stutters or stammers cannot be accommodated 
without altering the local helical structure. This occurs by preserving as far 
as possible the hydrogen-bonding pattern and knobs-into-holes packing of 
the helices and by looping out the extra residues. This is seen most 
dramatically in fibritin from bacteriophage T4 (Fig. 2c), where insertions 
of 4 and 12 residues are extruded with minimal disruptions in the conti- 
nuity of the helix. The extrusion of the four-residue insertion is surprising, 
given the arguments made above, but may have an energetic reason: both 
insertions are flanked by glycines, which are sterically required for the 
extrusion. Conversely, glycines are fairly rare in canonical coiled coils and 
do not favor an a-helical structure. We think that replacement of the 
glycines with other residues may cause the incorporation of the four- 
residue insert as a stutter. More frequent are insertions of just one residue 
(skips), which may also be looped out to form a r-turn as in the thumb of 
DNA polymerase I, or accommodated through a kink in the helix as in the 
GreA transcript cleavage factor (Fig. 2c). Skip residues tend to occur in 
antiparallel, single-chain coiled coils, which are often more irregular; we 
are not aware of any skip residues in parallel, oligomeric coiled coils of 
known structure. Many skip residues have been predicted in coiled coils 
from sequence analysis; in general, however, most seem to be structurally 
better represented by two successive stutters (Brown et al., 1996; Lupas 
et al., 1995; McLachlan and Karn, 1983; Seo and Cohen, 1993), which is 
not surprising since in a heptad reference frame, the insertion of twice 
four residues is equivalent to the insertion of a single residue. Indeed, one 
may argue that pairs of stutters arise most efficiently by the insertion of 
one residue, whose effects are then delocalized along the helix. 
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Fic. 5. Schematic diagram of coiled-coil periodicities in relation to the numbers of 
residues per turn, the sequence repeat length, and the number of helical turns per 
repeat. Transitions caused by the insertion of stutters are marked in green, and 
transitions caused by stammers in blue. Periodicities for which we found examples in 
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D. Coiled Coils that Deviate Globally from the Standard Model 


Some coiled coils seem to contain discontinuities in a periodically 
recurring pattern. For example, Holberton and colleagues detected per- 
iodicities of 29, 25, and 24 residues in the cytoskeletal proteins of Giardia 
(Holberton et al., 1988; Marshall and Holberton, 1993, 1995) and inter- 
preted these as four heptads plus a skip residue (29), three heptads plus a 
stutter (25), and three heptads plus a stammer (24). In these proteins, the 
discontinuities seemed to be spaced sufficiently far apart to suggest struc- 
tural models with local changes in the pitch and a meandering path of the 
helices in the coiled coil. Such local changes in pitch angle have been 
described for a number of structures (Seo and Cohen, 1993; Strelkov and 
Burkhard, 2002), on occasion even leading to transitions from left-handed 
to right-handed supercoils, as shown in Fig. 4. The crystal structure of a 
protein with a 25-residue periodicity, the stalk of Sendai virus phos- 
phoprotein (Fig. 5), shows however that the nonheptad periodicity may 
be accommodated by more global adjustments of the pitch, corresponding 
to an overall periodicity of 3.57 (25 residues over 7 turns). 

The possibility of such global departures from the heptad model was 
first suggested by the discovery of proteins with 11-residue periodicities 
(hendecads), such as the Lea proteins of plants (Dure III, 1993) and 
the stalk of the archaeal surface protein tetrabrachion (Peters et al., 1996). 
In these proteins, the overall hydrophobic periodicity of 3.67 (11/3) 
exceeded the periodicity of an undistorted a-helix (3.63), suggesting a 
right-handed twist for the supercoil. A detailed analysis of the tetrabra- 
chion stalk by biophysical and bioinformatic techniques revealed the 
presence of a continuous homotetrameric coiled coil of 70 nm length 
and extreme stability, which consisted of two segments separated by a 
single proline residue (Peters et al., 1995, 1996). The N-terminal segment 
showed a heptad periodicity with two closely spaced stutters (left-handed 
supercoil), while the C-terminal segment had a clear hendecad periodicity 
with one additional stutter and three stammers (right-handed supercoil). 
In modeling the tetrabrachion stalk, the hendecads were treated as a 
heptad plus a stutter; based on the effects of stutters on packing interac- 
tions described above, this meant the addition of a da or x layer to each 
heptad, depending on where the stutter was accommodated structurally 


the protein sequence database are connected by bold lines. Examples for the main 
periodicities from the structure database are shown above and beneath the diagram with 
cross sections illustrating core packing layer geometries. The region in which the 
constituent helices of a coiled coil are expected to be essentially straight is marked with 
a gray bar; periodicities above this bar cause a right-handed and periodicities beneath a 
left-handed supercoil. 


54 LUPAS AND GRUBER 


(using the heptad notation, the eleven positions of a hendecad, a-k, are 
equivalent to abcdabcdefg or to abcdefgxefg). An analysis of the average 
hydrophobicity of hendecad positions yielded high values for positions a 
and Ah, and intermediate values for positions d and e pointing to a core 
formed by a, d, e, and h. As confirmed by the crystal structure of a 
tetrabrachion fragment (Stetefeld ef al., 2000; Fig. 5), positions a and 
h of the hendecad correspond structurally to positions a and d of the 
heptad, while the hendecad d and e positions form a knobs-to-knobs 
packing layer equivalent to a da layer. The same core layers are seen in 
the short coiled coils of HdeA (1BG8), which consists of a single hende- 
cad, and of the estrogen receptor ligand-binding domain (1PCG); 1PCG 
contains three hendecads, but diverges at the ends, so that only the central 
hendecad (and, to a lesser extent, the N-terminal hendecad) form regular 
core interactions. The second possibility, to give han xlayer geometry and 
use three positions (a, d, and h) as the hydrophobic core, was realized in a 
designed protein (RH4) with the nonnatural amino acid alloisoleucine in 
position d. We are aware of one natural protein employing this packing 
mode: transcription factor CbnR (1IXG; Fig. 4), which accommodates the 
x (h) layers of its two hendecads by assuming an antiparallel orientation, as 
discussed in the previous section. 

The right-handed part of the tetrabrachion stalk contains a stutter within 
the hendecad frame, which introduces a 15/4 element (or pentadecad) and 
locally increases the right-handed supercoil by raising the periodicity to 3.75 
(Peters etal., 1996; Stetefeld etal., 2000; Fig. 5). In fact, given helices with 3.63 
residues per turn, pentadecads are as strongly supercoiled to the right as 
heptads are to the left. The major adhesin of Yersinia, YadA, contains a stalk 
formed almost entirely of pentadecads (Hoiczyk et al., 2000); the crystal 
structure of the YadA head domain contains part of the first pentadecad of 
the stalk, revealing a structure similar to the stutter region of tetrabrachion, 
albeit three- rather than four-stranded (1P9H; Fig. 5). Two-stranded coiled 
coils formed by pentadecads have also been determined: the tetrameriza- 
tion region of Mnt repressor is an unusual dimer of dimers, formed by two 
antiparallel coiled coils, each two pentadecads long (1QEY; Nooren et al., 
1999). Homing endonucleases (e.g., 1G9Z) contain a short parallel, homo- 
dimeric coiled coil of just 12 residues, whose supercoil angle suggests an 
underlying pentadecad periodicity. For such short coiled coils, side-chain 
packing does not allow us to distinguish between hendecads and pentade- 
cads, but the comparison between HdeA and homing endonucleases, which 
are of the same length and favor the same core residues (Gly in d, aromatic in 
e, Gly or Alain A) reveals a clearly larger supercoil angle in endonucleases. As 
in the CbnR repressor discussed earlier, the constrained nature of the x-like 
knobs-to-knobs layers found in Mnt repressor seems to favor an antiparallel 
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orientation of the helices, as the main alternative to symmetry breaks. In 
homing endonucleases, the short length of the coiled coil and a preference 
for Gly and Ala in the knobs-to-knobs core layers obviate these constraints. 

Hendecads and pentadecads are part of a larger picture (Gruber and 
Lupas, 2003). Database scans for putative coiled-coil sequences yield a fair 
number of instances in which the primary periodicity is not 7/2; the more 
common ones are 11/3, 15/4, 17/5, and 20/6. A plot of these periodi- 
cities shows that they are all related to the heptad repeat by regular 
insertions of three or four residues (i.e., stammers and stutters; Fig. 5). 
Indeed, the structure database holds examples for a surprising number of 
nonheptad periodicities, including 10/3, 11/3, 15/4, 18/5, and 25/7. In 
all cases, the periodicity can be decomposed into elements of three and 
four residues, whose hydrophobic pattern and residue composition is 
fundamentally compatible with the basic coiled-coil structure (Hicks 
et al, 1997), and Crick’s parameterization can be adapted easily to this 
generalization (Harbury et al., 1998; Offer et al., 2002; Peters et al., 1996). 
Alternating elements (3-4 and 4-3) yield knobs-into-holes packing, whereas 
consecutive elements of the same kind (3-3 and 4-4) yield knobs-to-knobs 
packing. Since four residues are more than a helical turn and three residues 
are less, and since the a-helix is right-handed, it follows that 4-4 patterns 
bias the structure towards a right-handed supercoil and 3-3 patterns towards 
a left-handed one. The graph in Fig. 5 shows how the progressive insertion of 
4—4 patterns raises the average number of residues per turn, until a coiled 
coil crosses the rubicon at around 3.63 and switches to a right-handed 
supercoil; the progressive insertion of 3-3 patterns has the opposite effect. 

It is unclear how many successive elements of the same kind can be used 
to obtain a coiled coil, but we have seen no evidence of coiled coils with 
periodicities of 4/1 or 3/1, suggesting that there are limits to the degree 
of distortion to which a-helices can be subjected. Indeed, most of the 
nonheptad periodicities we observed bring the coiled coils closer to the 
unsupercoiled state, rather than increasing the degree of supercoiling; 
thus, the 7/2 and 15/4 periodicities seem to bracket the main range in 
which coiled coils are found. [Interestingly, 7/2 and 15/4 were the main 
periodicities proposed by Pauling and Corey (1953), whose opposite 
handedness they also recognized.] One may envisage two energetic effects: 
the penalty incurred by distorting the a-helices into a supercoil and the 
one incurred by deviating from knobs-into-holes packing. In two-stranded 
coiled coils, which represent the large majority of all structures, the 
packing penalty outweighs the supercoil penalty, so that most show peri- 
odicities close to 7/2 and thus to the lower end of the bracket. Three- and 
particularly four-stranded coiled coils do not incur the same penalties for 
knobs-to-knobs packing, due to the greater amount of space available in 
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their core layers. Correspondingly, they show much more frequently 
periodicities deviating from the canonical heptad repeat. (As an aside, 
they are also harder to detect and analyze, since most computational 
tools have problems with sequences and structures that deviate from the 
standard model). 


Ill. STRUCTURAL DETERMINANTS OF FOLDING AND STABILITY 


A. Number of Helices 


The oligomeric state of a coiled coil is determined by packing interac- 
tions, and thus depends primarily on the nature of residues in positions a, 
d, e, and g. The geometry of side chains in positions a and d differs 
systematically between two-, three-, and four-stranded coiled coils, as dis- 
cussed in Section II.A; correspondingly, they also have different side chain 
preferences. In two-stranded coiled coils, the core packing geometry favors 
P-branched residues in a and unbranched or Y-branched residues in d; in 
four-stranded coiled coils, the situation is exactly reversed. Three-stranded 
coiled coils, in contrast, have a more uniform geometry in both a and d 
layers (called acute), which makes them the least selective. In a series of 
experiments with the GCN4 leucine zipper, Harbury et al. (1993, 1994) 
showed that the structure could be switched between two-, three-, and 
four-stranded states by directed changes in the core residues. Isoleucine in 
a and leucine in d produced dimers and the reverse produced tetramers; 
all other combinations of isoleucine, leucine, and valine produced trimers 
or, in some cases, mixtures of dimers and trimers. A subsequently obtai- 
ned retro-GCN4 structure, which represents the inverted sequence of 
the GCN4 zipper and thus reverses the positions of the core residues in 
the heptad pattern, unsurprisingly yielded a tetramer (Mittl et al., 2000). 

The GCN4 sequence contains an additional oligomerization deter- 
minant: a single polar residue (Asn) in position a. As Harbury et al. 
(1993) found, mutation of this residue to valine caused a loss of structural 
specificity and led to mixtures of dimers and trimers. Further mutations at 
this position to glutamine, lysine, norleucine, and aminobutyric acid 
(Gonzales et al., 1996a,b), as well as the related investigation of two lysines 
in position a of the Fos leucine zipper (Campell et al., 2002) yielded 
inconsistent results, suggesting that polar residues may promote oligomer- 
ization specificity but that their effect is sensitive to the surrounding 
structural environment. In general, increased hydrophobicity at the heli- 
cal interface disfavored dimer formation. These findings are consistent 
with the differences in residue frequencies for two- and three-stranded 
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coiled coils observed by Conway and Parry (1990, 1991), namely a strong 
decrease in the proportion of charged residues in the hydrophobic core of 
three-stranded coiled coils, as well as a more general loss of positional 
preferences for most amino acids. One may speculate that the three- 
stranded state represents a default setting for the coiled coil, and that 
dimers and tetramers require specific structural determinants for their 
formation. 

Much less is known about the role of residues in positions eand gin 
specifying oligomer states. Mutation of residues in position e of a leucine 
zipper peptide to alanine resulted in the formation of tetramers (Krylov 
et al., 1994), as did the disruption of an interchain ionic interaction in the 
trimeric coiled coil of cartilage matrix protein (Beck et al., 1997). However, 
it is currently difficult to do more than speculate on the possibility that 
increased hydrophobicity and/or decreased side-chain size could promote 
trimer and tetramer formation: short side chains by being less able to 
shield the hydrophobic core in a dimeric structure, and hydrophobic side 
chains by favoring extended packing interactions. Finally, we think that 
knobs-to-knobs core layers also favor the formation of higher oligomers 
because of their constrained nature in two-stranded coiled coils, but direct 
evidence for this conjecture is lacking. 


B. Orientation and Oligomer Specificity 


The orientation and preference for homo- or heterooligomeric associa- 
tion of helices in a coiled coil is determined primarily by ionic and polar 
interactions between residues in positions e and g, and also by interactions 
between these residues and polar residues in aand d [see, for example, the 
Fos-Jun heterodimer (Glover and Harrison, 1995) in comparison with the 
Jun homodimer (Junius ei al., 1996)]. Most interactions occur between 
residues in g of one heptad and e of the next heptad (interactions bet- 
ween positions eand g of the same heptad being generally prevented by 
the shape of the core). In four-stranded coiled coils, positions band c may 
also contribute, as here the core is much broader. The same interactions 
are also critical in determining whether a coiled-coil helix will form 
homooligomers or will preferentially heterooligomerize with a different 
helix. These effects have been studied in great detail with designed pep- 
tides and are discussed in this volume by Woolfson. There are, conversely, 
rather few results with natural sequences, probably because their low 
degree of redundancy makes the interpretation of outcomes difficult. 

Comparative analyses of the number of favorable interactions in various 
arrangements have been made to distinguish the register and oligomer 
specificity of coiled coils, such as tropomyosin (McLachlan and Stewart, 
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1975), keratin (Parry et al., 1977) and laminin (Beck et al., 1993), and have 
also been used to predict the heterodimerization partners for leucine 
zippers (Vinson ei al., 1993). Extending these efforts, a quantitative meth- 
od based on pairwise residue interactions for positions a, d, e, and g has 
been developed by Fong et al. (2004) and is available on the web (http:// 
compbio.cs.princeton.edu/bzip/). The inclusion of positions a and d 
is not only justified by the already mentioned possibility of a-g’ and del 
interactions, but also by some (albeit few) results suggesting an influence 
of the hydrophobic interactions: in the GCN4 leucine zipper, mutation of 
the core asparagine (Asn16) to alanine results in an antiparallel trimer, 
which replaces the expected Ala—Ala core layer with a Leu-Leu-Ala and 
an Ala—Ala—Leu layer, presumably to reduce the effects of cavity formation 
in the core (Holton and Alber, 2004). Monera et al. (1996) have described 
a similar alanine mutation, which switches the orientation of helices in 
a designed tetramer in order to form mixed Leu-Ala core layers. Core 
packing is probably also responsible for the antiparallel nature of the 
synthetic peptide coiled Ser (Lovejoy et al., 1993), whose single tryptophan 
residue is too bulky to fit three times into the same core layer. 

An interesting light is thrown on the relationship between designed and 
natural sequences by Arndt ei al. (2000), who used libraries based on the 
sequences of Jun and Fos to randomize positions e and g to Gln, Glu, Arg, 
and Lys and position a to Asn or Val, prior to selecting for heterodimers. 
They observed an enrichment of favorable interactions in the selected 
sequences, but even the best heterodimer they identified, WinZip-AlBl, 
still contained two repulsive eg interactions, mirroring much more closely 
the situation observed in natural sequences than the perfect matches 
engineered into designed peptides. A further study attempting to optimize 
WinZip-AlBl by genetic selection and, alternatively, by rational design 
found that the best genetically obtained pair retained the repulsive e-g 
interactions and that it was more effective than the designed pair in 
mediating heterodimerization in vivo (Arndt et al., 2002). The authors 
concluded that the effects of predicted charge pairs depend on sequence 
context, and complementary charges at the edge positions rationalize only 
a fraction of the sequences that form stable, specific coiled coils. 


C. Folding and Stability 


Fragments of natural coiled coils, even when they are of considerable 
size, rarely fold [see, for example, Trybus et al. (1997) for an analysis of 
the myosin rod]. This may well serve a biological purpose, since a coiled 
coil the size of myosin might find it difficult to reach its proper register 
if local coiled-coil interactions formed rapidly and randomly along the 
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1000-residue sequence. Still, it begs the question of how the native coiled 
coil folds. An intriguing hypothesis was provided by Steinmetz et al. (1998) 
with the trigger sequence, a short autonomous helical folding unit favoring 
oligomerization. They observed that the folding of cortexillin I, whose 
coiled-coil domain contains 18 heptads, was entirely dependent on the 
presence of two heptads close to the C-terminal end, which they proposed 
triggered the folding of the coiled coil. The crystal structure of the 
cortexillin I coiled coil revealed an abundance of favorable inter- and 
intrahelical ionic interactions in the trigger region, as well as tight packing 
interactions in its hydrophobic core (Burkhard et al., 1998). The authors 
identified potential trigger sequences in several other coiled coils, based 
on sequence similarity to the cortexillin trigger, and successfully tested the 
putative trigger they identified in GCN4 (Kammerer et al., 1998). Since 
then, trigger sequences were identified in other proteins, such as the 
macrophage scavenger receptor (Frank et al., 2000), intermediate filament 
proteins (Wu et al., 2000), hantavirus nucleocapsid protein (Alfadhli 
et al., 2002), and tropomyosin (Araya et al., 2002). However, the trigger 
sequences in the various proteins show considerable diversity; it is there- 
fore unlikely that a consensus sequence for trigger sites exists. Rather, as 
shown by Lee et al. (2001) for a cortexillin/GCN4 hybrid lacking the 
trigger sites, folding could be obtained by a combination of stabilizing 
mutations that improved helicity, electrostatic interactions, and hydropho- 
bic packing, but did not bring the final sequence closer to the trigger 
consensus. Thus, trigger sequences may correspond to short regions in 
coiled coils, which can serve as nucleation sites for folding due to their 
unusual a-helical stability and high number of interactions stabilizing the 
oligomeric form. They will only adhere to a consensus as far as the 
principles guiding coiled-coil stability can be reduced to a consensus 
sequence. 

The main properties that provide stability to a coiled coil are helical 
propensity, hydrophobicity of the core, tightness of the core packing, 
shielding of the core from solvent, and favorable polar and ionic interac- 
tions. Beyond general statements, the detailed contribution of residues to 
the stability of natural coiled coils is not well understood. Efforts to 
elucidate the contribution of different residues, such as the host-guest 
studies of Hodges on the effects of the 20 amino acids at positions aand d 
of a model peptide (Tripet et al., 2000; Wagschal et al., 1999), ignore the 
complementarity of side-chain packing and the importance of sequence 
context for judging the effects of a substitution. Correspondingly, the 
results correlate poorly with the residue frequencies observed in natural 
coiled coils. It may be argued, however, based on the experience from 
engineering stable repetitive proteins (Kohl et al., 2003; Main et al., 2003; 
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Mosavi et al., 2002), that the consensus sequence embodies the most stable 
form of that particular fold. In that respect, it is noteworthy that coiled 
coils in archaea (e.g., prefoldin), which have a greater stability than their 
homologs in eukaryotes, match consensus matrices of coiled-coil se- 
quences better, even though they did not contribute to them (A. Lupas, 
unpublished observation). 

An effect that clearly enhances the stability of coiled coils is the 
number of helices in the structure. More helices mean a broader hydro- 
phobic interface, better shielding of the core, and more opportunities 
for polar and ionic interactions by the involvement of positions band c. 
GCN4-pIL, which forms dimers, is thus less stable than the practically 
sequence-identical GCN4-pLI, which forms tetramers (Harbury et al., 
1993). Similarly, the Fos homodimer is much less stable than the tetramer, 
obtained by mutation of two core lysine residues to norleucine (Campbell 
et al., 2002). Two of the most stable coiled coils known to us, tetrabrachion 
and Sendai virus phosphoprotein, are both tetramers. Denaturation of the 
tetrabrachion coiled coil requires 70% sulfuric acid, fuming trifluoro- 
methanesulfonic acid, or heating to 130°C for 30 minutes in 6 M guani- 
dinium hydrochloride (Peters et al., 1995). Sendai virus phosphoprotein 
withstands extensive deletions, point mutations to proline in core resi- 
dues, and two-residue insertions that move the hydrophobic seam to the 
opposite face of the helix (Tarbouriech et al., 2000). In both structures, 
the interactions between neighboring helices appear to provide such a 
degree of stability that they can overcompensate for the formation of core 
cavities. Heretically, both structures contain considerable quantities of 
water in their ““hydrophobic”’ core. 


IV. STRUCTURAL DIVERSITY 


A. Fibers and Zippers 


Historically, coiled coils were identified with long fibrous molecules, 
from which their structural properties had been determined. Fiber dif- 
fraction studies on proteins of the k-m-e-f class were highly successful, 
initially on dried specimens but later also on native samples (Cohen and 
Holmes, 1963). However, these proteins turned out to be very difficult to 
analyze by high-resolution X-ray crystallography for the same reasons that 
made them so amenable to fiber diffraction—their tendency to aggregate 
into fibers rather than crystals and the extreme dimensions of their 
asymmetric units. It took decades to obtain a working structure for tro- 
pomyosin [at 15 A resolution (Phillips et al., 1986); at 9 Ä (Whitby et al., 
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1992); at 7 Ä (Whitby and Phillips, 2000); N-terminal fragment at 2 Ä 
(Brown et al., 2001); C-terminal fragment at 2.7 A (Li et al., 2002)], anda 
high-resolution structure for the entire molecule still remains to be deter- 
mined. Arguably, the only k-m-e-f protein with a known high-resolution 
structure is fibrinogen (Brown et al., 2000; Yang et al., 2000, 2001). 

Not surprisingly, the Protein Data Bank primarily offers examples of 
short coiled coils. The first ones to be determined at high resolution were 
the three-stranded coiled coil in influenza hemaglutinin (Wilson et al., 
1981) and the two-stranded coiled coil in the catabolite gene activator 
protein CAP (McKay and Steitz, 1981), but the coiled coil in CAP was only 
recognized a decade later (Nilges and Brunger, 1991). In 1988, when 
the leucine zipper hypothesis was formulated (Landschulz et al., 1988), 
the coiled-coil crystal structures in PDB could still be counted on the 
fingers of one hand. The leucine zipper changed research into coiled 
coils fundamentally by refocusing the studies from long fibers to a broad 
range of shorter coiled coils found in globular proteins. The leucine 
zipper also provided an ideal vehicle for exploring issues in coiled-coil 
packing, oligomerization, folding, and stability, which were discussed in 
the previous section and, in greater detail, in Chapter 4 by Woolfson. 
Originally, the leucine zipper was proposed to be an antiparallel arrange- 
ment of two helices, which dimerized by the ridges-into-grooves meshing 
of four leucines in a heptad spacing (Landschulz et al., 1988), but it was 
soon recognized as a new short form of the coiled coil, in which the 
stability was derived from the optimized packing of the leucines in position 
d (O’Shea et al., 1991). Since then, the number of coiled-coil structures in 
PDB has increased dramatically. The two main groups recognizable today 
are oligomeric structures, which conform to the original ideas about 
coiled coils, and single-chain antiparallel helical bundles, which were only 
gradually recognized to form a valid subgroup of the coiled coil proteins 
(Cohen and Parry, 1986, 1990) (Fig. 1). The leucine zipper can now be 
seen to belong to a much larger group of short, dimeric coiled coils, 
parallel and antiparallel, which serve to oligomerize and position many of 
the most common DNA-binding domains (Fig. 6c), including the basic 
region, helix turn helix (HtH) domains (such as in CAP), and zinc fingers. 


B. Tubes, Sheets, Spirals, Funnels, and Rings 


Coiled-coil helices usually have one stripe of residues engaged in knobs- 
into-holes interactions. There are, however, instances where a helix may 
engage in such interactions along two stripes (Cohen and Parry, 1990; 
Walshaw and Woolfson, 2003; Walshaw et al., 2001). For example, helices 
of four-stranded coiled coils make their primary knobs-into-holes contacts 
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Fic. 6. Coiled coils that “‘grow’’ out of other folds. (a) Response regulators of two- 
component signal transduction (TCST) systems dimerizing through the extension of 
helix a5. (b) Dimerization interface in a GTP-binding protein by extension of helix a4. 
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with their neighboring helices, using positions a and e on one side and d 
and gon the other. The diagonally opposite helix is furthest away and 
makes the least number of contacts, frequently to the point where the 
center of the coiled coil contains cavities. These are sometimes filled 
by water or other molecules, giving the coiled coil the nature of a tube 
(Figs. 5 and 7). The tube-shaped character of such coiled coils is naturally 
enhanced as the number of helices grows, and with it the diameter of the 
tube. Figure 7 shows examples of tube structures with 4, 5, 6, and 12 
helices, the last forming an opening with a diameter of about 20 A. 

As the position of the two stripes diverges on the helix surface towards 
opposite sides, the association takes the form of a sheet. The structure 
database offers multiple examples of such sheets, usually containing three 
or four helices (Fig. 8). As can be seen from their cross sections, forming 
knobs-into-holes interactions on opposite sides of one helix leads to 
“supercoil schizophrenia,” since the interactions require supercoiling in 
opposite directions. This results in a straight conformation of the central 
helices and an increased distortion of the flanking helices. In turn, this 
means that the helices gradually move out of register, so that a regular 
packing can only be achieved approximately and over a limited distance. 

Sheets meet tubes in the formation of spirals. For example, the major 
coat protein subunits of filamentous bacteriophage assemble into curved 
sheets that further assemble into a multistranded spiral (Fig. 9). Coat 
proteins from different bacteriophages differ in the extent to which they 
engage in knobs-into-holes interactions and in the degree of curvature of 
the sheets. Correspondingly different assemblies use different numbers of 
sheets to form the spiral (five in phage Ike and six in phage Pf1). Similar 
principles guide the assembly of coiled-coil spirals in two types of bacterial 
surface structures, flagella and pili. For the flagella of Salmonella typhimur- 
ium, Namba and colleagues determined the crystal structure of the flagel- 
lin subunit (Samatey et al., 2001) and then used electron cryomicroscopy 
to place the subunits into the R form of the filament (Yonekura et al., 
2003); Tainer and colleagues used a similar approach to image the type IV 
pili of Vibrio cholerae and Pseudomonas aeruginosa (Craig et al., 2003). A 
special case of a spiral is given by the multidrug resistance protein MexA, 
whose crystal structure shows an unusual, hourglass assembly of two spiral 
arcs packed end to end (Fig. 9). MexA belongs to a class of periplasmic 


(c) Coiled-coil mediated dimerization in transcription factors: the basic region-helix 
loop helix (bHLH) DNA-binding domain (from MyoD) and coiled coils obtained by 
extending either the basic region helices (from Fos) or the helices of the HLH motif 
(from Max). The helix-turn-helix DNA-binding domain (from lambda repressor) and 
its elaboration with a coiled coil (in cueR). 
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Fic. 9. Coiled-coil spirals. For the phage coat proteins and flagellin, subunits are 
shown enlarged next to the structures, as well as the cross sections of the coiled-coil 
sheets they form. The positions of the subunits in the structures are indicated in white. 
The core packing layers are also shown for the phage coat proteins in order to illustrate 
the use of knobs-into-holes and ridges-into-grooves layers. 
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adaptor molecules that connect inner-membrane transporters to outer 
membrane efflux pores, such as TolC (Fig. 7), yielding an efflux pump 
complex that spans the entire envelope of Gram-negative bacteria. For 
biological reasons, this structure is thought to be an artifact of crystalliza- 
tion, and the in vivo structure is thought to resemble a funnel (Higgins 
et al., 2004). Another funnel-shaped assembly based on coiled-coil inter- 
actions is formed by the upper collar protein of Bacillus bacteriophage 
29, which is part of the motor that translocates the phage DNA into the 
precursor capsid (Fig. 10). 

A protein that closes the circle back to fibers, from which the coiled-coil 
structure took its start, is apolipoprotein A-II (Fig.11), an astonishing 
complex formed by subunits of 77 residues with an underlying 22-residue 
periodicity. Four subunits assemble into a highly irregular, parallel bun- 
dle, which shows knobs-into-holes interactions only locally and only in 
pairwise helical contacts (but has the supercoil angle expected for a 
periodicity of 3.67). The helices have multiple symmetry breaks, inclu- 
ding a most pronounced one in a central area that unfolds to different 
degrees in the “inner” and “‘outer’’ subunits, thus bending the bundle. 
The bent bundles assemble head-to-tail into a spiral-shaped fiber. Despite 
a homologous origin and a common underlying 22/6 periodicity (Boguski 
et al., 1985), apolipoproteins show an astounding structural diversity: 
Apolipoprotein A-I is built of 251-residue subunits that also assemble into 
long four-stranded helical bundles with limited knobs-into-holes interac- 
tions and a right-handed supercoil twist, albeit in an antiparallel orienta- 
tion and in a staggered way, such that no two N-termini are in the same 
register. This arrangement results in the formation of a single twisted ring 
(Fig. 11). In contrast, apolipoprotein E N-terminal domain (191 residues) 
and the insect apolipophorin III (180 residues) form monomeric, single- 
chain helical bundles with a left-handed supercoil twist and fairly regular 
knobs-into-holes interactions (Fig. 11). 


C. What Exactly is a Coiled Coil? 


The diversity of coiled coils presented in this Chapter raises for us the 
question: can one hope to distinguish unambiguously every such structure 
from all others? A pron, the peculiar and highly regular structure implied 
by the standard model would suggest that the answer should be yes; knobs- 
into-holes packing, supercoiling, symmetric arrangement of the helices, 
low crossing angle, length of the structure, and distinctness from other 
domains should provide multiple criteria by which to judge whether a 
bundle of helices forms a coiled coil. In practice, there is a continuum of 
structures that stretch along an imaginary axis of ‘“‘coiled-coiledness’’ and 
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Bacteriophage o Upper 
Collar Protein (1FOU) 


Fic. 10. A coiled-coil funnel. 


for any criterion, protein domains can be found that violate it and yet 
would be accepted in general as coiled coils. Regarding distinctness 
from other domains, we have compiled examples of protein families in 
Fig. 6, in which coiled-coil domains “‘grow’’ through the extension of 
helices that are integral components of other folds, such as helix a5 of the 
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Apolipoprotein AJ (1L6L) 


Apolipoprotein E Apolipophorin-IIl 
(10R3) (1LS4) 


Fic. 11. Structure diversity in apolipoproteins. Despite being of homologous origin 
and sharing an underlying 22-residue sequence periodicity, different apolipoproteins 
show fundamental differences in oligomerization state, subunit orientation, and 
supercoil angle. 
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flavodoxin fold in bacterial response regulators (Fig. 6a) or helix a4 of the 
P-loop NTPase fold in YchF (Fig. 6b). In Fig. 12, coiled coils arise within 
the folds themselves, for example in the flavodoxin-like N-terminal do- 
main of formyl-CoA transferase (1P5H), which dimerizes via coiled-coil 
interactions in helix ab. 

Regarding length of the structure, it is certainly the case that many 
classical coiled coils are very long and that even the leucine zipper, 
once considered to mark the short end of the coiled-coil distribution, 
is four heptads long. There are however several rather clear coiled 
coils, which are only ten or so residues long, such as in Gal4 (1D66; 14 
residues) or the previously discussed homing endonucleases (e.g., 1G9Z; 
12 residues). Regarding the crossing angle, many “‘square’’ helical bun- 
des (following the nomenclature of Harris et al., 1994) have the same 
crossing angle as canonical coiled coils (around 20 degrees) but show 
primarily ridges-into-grooves interactions, as do most helical bundles 
found in the membrane. Conversely, nonheptad coiled coils show a 
diversity of crossing angles, from about 25 degrees to —25 degrees. 
Regarding a symmetric arrangement of helices, we have already encoun- 
tered in this Chapter many examples of symmetry breaks, and the same 
may be said of supercoiling and the pitch diversity of coiled coils. 

This brings us to the one property that is usually regarded as the 
hallmark of coiled-coil structure: knobs-into-holes packing. As we have 
shown earlier, knobs-to-knobs interactions are an integral part of nonhep- 
tad coiled coils. One may however still judge that coiled coils differ from 
other structures by the fact that their core layers move in register, rather 
than showing the staggered arrangements of ridges-into-grooves. Here 
again, though, there are many exceptions. In spectrin (Fig. 2b), two 
helices form regular knobs-into-holes interactions, while the third is out 
of register. In GrpE (Fig. 4), a parallel, homodimeric coiled coil moves out 
of register after a stutter to show local ridges-into-grooves interactions, 
then returns to knobs-into-holes packing. In apolipoproteins A-I and A-II, 
ridges-into-grooves coexist with knobs-into-holes to produce some highly 
unusual tetrameric coiled coils. Or are these still coiled coils? We find 
that there is no criterion or set of criteria that would allow us to resolve 
that question. 


V. FUNCTION FOLLOWS STRUCTURE 


More clearly than in many other protein folds, the function of coiled 
coils follows from their main structural properties. Coiled coils are usually 
long, rigid oligomers of helices with regular packing interactions and 
extended exposed surfaces. This enables them to assemble into large, 
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Fic. 12. Coiled coils arising between helices that are part of different folds. (a) 
Soluble proteins. (b) Example of a membrane protein. Inner membrane proteins are 
a-helical proteins with an up-and-down topology; their helices therefore favor low 
crossing angles and frequently show mixtures of knobs-into-holes and ridges-into- 
grooves interactions. 
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mechanically rigid structures such as hair, horn, feathers (keratin), and 
blood clots (fibrinogen); extracellular matrices (laminin) and cytoskeletal 
networks (intermediate filaments); and a broad array of filaments (flagel- 
lins, pilins, phage coat proteins). They are an efficient way to project across 
large distances and are therefore often found in molecular spacers (murein 
lipoprotein, Omp o) and the stalks of surface proteins (nonfimbrial adhe- 
sins, viral fusion proteins). Their rigidity and large exposed surfaces 
allow them to function as the arms of proteins involved in the mani- 
pulation of polynucleotides (Seryl-tRNA synthetase, DNA polymerase I) 
and of unfolded polypeptides (ClpB, prefoldin). These arms are single- 
chain, antiparallel hairpins of helices that, by lacking sequence symmetry, 
can adapt more precisely to the interaction with asymmetric ligands and 
can move more readily around a hinge (in a homooligomer, any such 
motion would involve a symmetry break) (Martin et al., 2004). Above 
all, coiled coils are ideal mediators of oligomerization, assuming this role 
in a multitude of proteins such as transcription factors (leucine zippers), 
molecular motors (myosin, kinesin), receptors (macrophage scavenger 
receptor, chemotaxis receptors), and signaling molecules (G protein 
Gy). Their regular packing interactions make these interchangeable to a 
much larger extent than in most proteins. Correspondingly, they may 
change their oligomer state and conformation in response to changes 
in their environment, an ability that is exploited most impressively in 
membrane fusion proteins, as illustrated by the structures of influenza 
hemaglutinin at neutral (2HMG) and acidic (1HTM) pH. 

In summary, coiled coils are versatile structural elements that are widely 
used in many protein families. The wealth of forms discovered by high- 
resolution structure studies in recent years, some of which are truly 
astonishing even to a seasoned coiled-coil watcher, raise the hope of many 
more that still remain to be discovered. 


REFERENCES 


Alfadhli, A., Steel, E., Finlay, L., Bachinger, H. P., and Barklis, E. (2002). Hantavirus 
nucleocapsid protein coiled-coil domains. J. Biol. Chem. 277, 27103-27108. 

Araya, E., Berthier, C., Kim, E., Yeung, T., Wang, X., and Helfman, D. M. (2002). 
Regulation of coiled-coil assembly in tropomyosins. J. Struct. Biol. 137, 176-183. 

Arndt, K. M., Pelletier, J. N., Müller, K. M., Alber, T., Michnick, S. W., and Plückthun, A. 
(2000). A heterodimeric coiled-coil peptide pair selected in vivo from a designed 
library-versus-library ensemble. J. Mol. Biol. 295, 627-639. 

Arndt, K. M., Pelletier, J. N., Müller, K. M., Plückthun, A., and Alber, T. (2002). 
Comparison of in vivo selection and rational design of heterodimeric coiled coils. 
Structure 10, 1235-1248. 

Arnott, S., and Dover, S. D. (1967). Refinement of bond angles of an a-helix. J. Mol. 
Biol. 30, 209-212. 


COILED-COIL STRUCTURE 73 


Beck, K., Dixon, T. W., Engel, J., and Parry, D. A. D. (1993). Ionic interactions in the 
coiled-coil domain of laminin determine the specificity of chain assembly. J. Mol. 
Biol. 231, 311-323. 

Beck, K., Gambee, J. E., Kamawal, A., and Bachinger, H. P. (1997). A single amino acid 
can switch the oligomerization state of the a-helical coiled-coil domain of cartilage 
matrix protein. EMBO J. 16, 3767-3777. 

Berger, B., and Singh, M. (1997). An iterative method for improved protein structural 
motif recognition. J. Comput. Biol. 4, 261-273. 

Berger, B., Wilson, D. B., Wolf, E., Tonchev, T., Milla, M., and Kim, P. S. (1995). 
Predicting coiled coils by use of pairwise residue correlations. Proc. Nail. Acad. 
Sci. 92, 8259-8263. 

Boguski, M. S., Elshourbagy, N., Taylor, J. M., and Gordon, J. I. (1985). Comparative 
analysis of repeated sequences in rat apolipoproteins A-I, A-IV, and E. Proc. Nail. 
Acad. Sci. 82, 992-996. 

Brown, J. H., Cohen, C., and Parry, D. A. D. (1996). Heptad breaks in a-helical coiled 
coils: Stutters and stammers. Proteins 26, 134-145. 

Brown, J. H., Volkmann, N., Jun, G., Henschen-Edman, A. H., and Cohen, C. (2000). 
The crystal structure of modified bovine fibrinogen. Proc. Natl. Acad. Sci. 97, 85-90. 

Brown, J. H., Kim, K. H., Jun, G., Greenfield, N. J., Dominguez, R., Volkmann, N., 
Hitchcock-DeGregori, S. E., and Cohen, C. (2001). Deciphering the design of the 
tropomyosin molecule. Proc. Natl. Acad. Sci. 98, 8496-8501. 

Burkhard, P., Steinmetz, M. O., Schulthess, T., Landwehr, R., Aebi, U., and Kammerer, 
R. A. (1998). Crystallization and preliminary X-ray diffraction analysis of the 190-A- 
long coiled-coil dimerization domain of the actin-bundling protein cortexillin I 
from Dictyostelium discoideum. J. Struct. Biol. 122, 293—296. 

Campbell, K. M., Sholders, A. J., and Lumb, K. J. (2002). Contribution of buried lysine 
residues to the oligomerization specificity and stability of the Fos coiled coil. 
Biochemistry 41, 4866-4871. 

Chothia, C., Levitt, M., and Richardson, D. (1977). Structure of proteins: Packing of 
a-helices and pleated sheets. Proc. Natl. Acad. Sci. 74, 4130-4134. 

Chothia, C., Levitt, M., and Richardson, D. (1981). Helix to helix packing in proteins. 
J. Mol. Biol. 145, 215-250. 

Cohen, C., and Holmes, K. C. (1963). X-ray diffraction evidence for a-helical coiled- 
coils in native muscle. J. Mol. Biol. 6, 423-432. 

Cohen, C., and Parry, D. A. D. (1986). a-Helical coiled coils—A widespread motif in 
proteins. Trends Biochem. Sci. 11, 245-248. 

Cohen, C., and Parry, D. A. D. (1990). a-Helical coiled coils and bundles: How to 
design an a-helical protein. Proteins 7, 1-15. 

Conway, J. F., and Parry, D. A. D. (1990). Structural features in the heptad substructure 
and longer range repeats of two-stranded a-fibrous proteins. Int. J. Biol. Macromol. 
12, 328-334. 

Conway, J. F., and Parry, D. A. D. (1991). Three-stranded a-fibrous proteins: The 
heptad repeat and its implication for structure. Int. J. Biol. Macromol. 13, 14-16. 

Craig, L., Taylor, R. K., Pique, M. E., Adair, B. D., Arvai, A. S., Singh, M., Lloyd, S. J., 
Shin, D. S., Getzoff, E. D., Yeager, M., Forest, K. T., and Tainer, J. A. (2003). Type 
IV pilin structure and assembly: X-ray and EM analyses of Vibrio cholerae toxin- 
coregulated pilus and Pseudomonas aeruginosa PAK pilin. Mol. Cell 11, 1139-1150. 

Crick, F. H. C. (1952). Is a-keratin a coiled coil? Nature 170, 882-883. 

Crick, F. H. C. (1953a). The Fourier transform of a coiled-coil. Acta Crystallogr. 6, 
685-689. 


74 LUPAS AND GRUBER 


Crick, F. H. C. (1953b). The packing of a-helices: Simple coiled-coils. Acta Crystallogr. 6, 
689-697. 

Delorenzi, M., and Speed, T. (2002). An HMM model for coiled-coil domains and a 
comparison with PSSM-based predictions. Bioinformatics 18, 617-625. 

Dure, 3rd, L. (1993). A repeating ll-mer amino acid motif and plant desiccation. 
Plant J. 3, 363-369. 

Fong, J. H., Keating, A. E., and Singh, M. (2004). Predicting specificity in bZIP coiled- 
coil protein interactions. Genome Biol. 5, R11. 

Frank, S., Lustig, A., Schulthess, T., Engel, J., and Kammerer, R. A. (2000). A distinct 
seven-residue trigger sequence is indispensable for proper coiled-coil formation of 
the human macrophage scavenger receptor oligomerization domain. J. Biol. Chem. 
275, 11672-11677. 

Fraser, R. D. B., and MacRae, T. P. (1973). Jn “Conformation in Fibrous Proteins and 
Related Synthetic Polypeptides,” pp. 456-465. Academic Press, London. 

Glover, J. N., and Harrison, S. C. (1995). Crystal structure of the heterodimeric bZIP 
transcription factor c-Fos-c-Jun bound to DNA. Nature 373, 257-261. 

Gonzales, L., Jr., Brown, R. A., Richardson, D., and Alber, T. (1996a). Crystal structures 
of a single coiled-coil peptide in two oligomeric states reveal the basis for structural 
polymorphism. Nat. Struct. Biol. 3, 1002-1010. 

Gonzales, L., Jr., Woolfson, D. N., and Alber, T. (1996b). Buried polar residues 
and structural specificity in the GCN4 leucine zipper. Nat. Struct. Biol. 3, 
1011-1018. 

Gruber, M., and Lupas, A. N. (2003). Historical review: Another 50th anniversary—New 
periodicities in coiled coils. Trends Biochem. Sci. 28, 679-685. 

Harbury, P. B., Zhang, T., Kim, P. S., and Alber, T. (1993). A switch between two-, three-, 
and four-stranded coiled coils in GCN4 leucine zipper mutants. Science 262, 
1401-1407. 

Harbury, P. B., Kim, P. S., and Alber, T. (1994). Crystal structure of an isoleucine-zipper 
trimer. Nature 371, 80-83. 

Harbury, P. B., Plecs, J. J., Tidor, B., Alber, T., and Kim, P. S. (1998). High-resolution 
protein design with backbone freedom. Science 282, 1462-1467. 

Harris, N. L., Presnell, S. R., and Cohen, F. E. (1994). Four helix bundle diversity in 
globular proteins. J. Mol. Biol. 236, 1356-1368. 

Hicks, M. R., Holberton, D. V., Kowalczyk, C., and Woolfson, D. N. (1997). Coiled-coil 
assembly by peptides with non-heptad sequence motifs. Fold. Des. 2, 149-158. 
Higgins, M. K., Bokma, E., Koronakis, E., Hughes, C., and Koronakis, V. (2004). 
Structure of the periplasmic component of a bacterial drug efflux pump. Proc. 

Natl. Acad. Sci. 101, 9994-9999. 

Hoiczyk, E., Roggenkamp, A., Reichenbecher, M., Lupas, A., and Heesemann, J. 
(2000). Structure and sequence analysis of Yersinia YadA and Moraxella UspAs 
reveal a novel class of adhesins. EMBO J. 19, 5989-5999. 

Holberton, D., Baker, D. A., and Marshall, J. (1988). Segmented a-helical coiled- 
coil structure of the protein giardin from the Giardia cytoskeleton. J. Mol. Biol. 
204, 789-795. 

Holton, J., and Alber, T. (2004). Automated protein crystal structure determination 
using ELVES. Proc. Natl. Acad. Sci. 101, 1537-1542. 

Junius, F. K., O'Donoghue, S. I., Nilges, M., Weiss, A. S., and King, G. F. (1996). High 
resolution NMR solution structure of the leucine zipper domain of the c-Jun 
homodimer. J. Biol. Chem. 271, 13663-13667. 


COILED-COIL STRUCTURE 75 


Kammerer, R. A., Schulthess, T., Landwehr, R., Lustig, A., Engel, J., Aebi, U., and 
Steinmetz, M. O. (1998). An autonomous folding unit mediates the assembly of 
two-stranded coiled coils. Proc. Natl. Acad. Sci. 95, 13419-13424. 

Kohl, A., Binz, H. K., Forrer, P., Stumpp, M. T., Plickthun, A., and Grutter, M. G. 
(2003). Designed to be stable: Crystal structure of a consensus ankyrin repeat 
protein. Proc. Natl. Acad. Sci. 100, 1700-1705. 

Krylov, D., Mikhailenko, I., and Vinson, C. (1994). A thermodynamic scale for leucine 
zipper stability and dimerization specificity: e and g interhelical interactions. 
EMBO J. 13, 2849-2861. 

Landschulz, W. H., Johnson, P. F., and McKnight, S. L. (1988). The leucine zipper: A 
hypothetical structure common to a new class of DNA binding proteins. Science 
240, 1759-1764. 

Lee, D. L., Lavigne, P., and Hodges, R. S. (2001). Are trigger sequences essential in the 
folding of two-stranded a-helical coiled-coils? J. Mol. Biol. 306, 539-553. 

Li, Y., Mui, S., Brown, J. H., Strand, J., Reshetnikova, L., Tobacman, L. S., and Cohen, 
C. (2002). The crystal structure of the C-terminal fragment of striated-muscle 
a-tropomyosin reveals a key troponin T recognition site. Proc. Natl. Acad. Sci. 99, 
7378-7383. 

Lovejoy, B., Choe, S., Cascio, D., McRorie, D. K., DeGrado, W. F., and Eisenberg, D. 
(1993). Crystal structure of a synthetic triple-stranded a-helical bundle. Science 259, 
1288-1293. 

Lupas, A. (1996a). Prediction and analysis of coiled-coil structures. Meth. Enzym. 266, 
513-525. 

Lupas, A. (1996b). Coiled coils: New structures and new functions. Trends Biochem. Sci. 
21, 375-382. 

Lupas, A., Van Dyke, S., and Stock, J. (1991). Predicting coiled coils from protein 
sequences. Science 252, 1162-1164. 

Lupas, A., Muller, S., Goldie, K., Engel, A. M., Engel, A., and Baumeister, W. (1995). 
Model structure of the Ompa rod, a parallel four-stranded coiled coil from the 
hyperthermophilic eubacterium Thermotoga maritima. J. Mol. Biol. 248, 180-189. 

Main, E. R., Xiong, Y., Cocco, M. J., D’Andrea, L., and Regan, L. (2003). Design of 

stable a-helical arrays from an idealized TPR motif. Structure 11, 497-508. 

Marshall, J., and Holberton, D. V. (1993). Sequence and structure of a new coiled coil 

protein from a microtubule bundle in Giardia. J. Mol. Biol. 231, 521-530. 

Marshall, J., and Holberton, D. V. (1995). Giardia gene predicts a 183 kDa nucleotide- 

binding head-stalk protein. J. Cell Sci. 108, 2683-2692. 

Martin, J., Gruber, M., and Lupas, A. N. (2004). Coiled coils meet the chaperone world. 

Trends Biochem. Sci. 29, 455-458. 

McKay, D. B., and Steitz, T. A. (1981). Structure of catabolite gene activator protein at 

2.9 A resolution suggests binding to left-handed B-DNA. Nature 290, 744-749. 

McLachlan, A. D., and Karn, J. (1983). Periodic features in the amino acid sequence of 

nematode myosin rod. J. Mol. Biol. 164, 605-626. 

McLachlan, A. D., and Stewart, M. (1975). Tropomyosin coiled-coil interactions: 

Evidence for an unstaggered structure. J. Mol. Biol. 98, 293-304. 

McLachlan, A. D., and Stewart, M. (1976). The 14-fold periodicity in a-tropomyosin 

and the interaction with actin. J. Mol. Biol. 103, 271-298. 

Mittl, P. R. E., Deillon, C., Sargent, D., Liu, N., Klauser, S., Thomas, R. M., Gutte, B., 
and Grütter, M. G. (2000). The retro-GCN4 leucine zipper sequence forms a stable 
three-dimensional structure. Proc. Natl. Acad. Sci. 97, 2562-2566. 


76 LUPAS AND GRUBER 


Monera, O. D., Zhou, N. E., Lavigne, P., Kay, C. M., and Hodges, R.S. (1996). Forma- 
tion of parallel and antiparallel coiled-coils controlled by the relative positions of 
alanine residues in the hydrophobic core. J. Biol. Chem. 271, 3995-4001. 

Mosavi, L. K., Minor, D. L., Jr., and Peng, Z. Y. (2002). Consensus-derived structural 

determinants of the ankyrin repeat motif. Proc. Natl. Acad. Sci. 99, 16029-16034. 

Nilges, M., and Brunger, A. T. (1991). Automated modeling of coiled coils: Application 

to the GCN4 dimerization region. Protein Eng. 4, 649-659. 

Nooren, I. M. A., Kaptein, R., Sauer, R. T., and Boelens, R. (1999). The tetramerization 
domain of the Mnt repressor consists of two right-handed coiled coils. Nat. Struct. 
Biol. 6, 755-759. 

O’Donoghue, S. I., and Nilges, M. (1997). Tertiary structure prediction using mean- 
force potentials and internal energy functions: Successful prediction for coiled-coil 
geometries. Fold. Des. 2, 47-52. 

Offer, G., and Sessions, R. (1995). Computer modelling of the a-helical coiled coil: 
Packing of side-chains in the inner core. J. Mol. Biol. 249, 967-987. 

Offer, G., Hicks, M. R., and Woolfson, D.N. (2002). Generalized Crick equations for 
modeling noncanonical coiled coils. J. Struct. Biol. 137, 41-53. 

O’Shea, E. K., Klemm, J. D., Kim, P. S., and Alber, T. (1991). X-ray structure of the 
GCN4 leucine zipper, a two-stranded, parallel coiled coil. Science 254, 539-544. 
Parry, D. A. D. (1975). Analysis of the primary sequence of a-tropomyosin from rabbit 

skeletal muscle. J. Mol. Biol. 98, 519-535. 

Parry, D. A. D. (1982). Coiled-coils in a-helix-containing proteins: Analysis of the 
residue types within the heptad repeat and the use of these data in the prediction 
of coiled-coils in other proteins. Biosci. Rep. 2, 1017-1024. 

Parry, D. A. D., Crewther, W. G., Fraser, R. D., and MacRae, T. P. (1977). Structure of a- 
keratin: Structural implication of the amino acid sequences of the type I and type 
II chain segments. J. Mol. Biol. 113, 449-454. 

Pauling, L., and Corey, R. B. (1950). Two hydrogen-bonded spiral configurations of the 
polypeptide chain. J. Am. Chem. Soc. 72, 534. 

Pauling, L., and Corey, R. B. (1953). Compound helical configurations of polypeptide 
chains: Structure of proteins of the a-keratin type. Nature 171, 59-61. 

Pauling, L., Corey, R. B., and Branson, H. R. (1951). The structure of proteins: Two 
hydrogen-bonded helical configurations of the polypeptide chain. Chemistry 37, 
205-211. 

Perutz, M. F. (1951). The 1-5-A reflexion from proteins and polypeptides. Nature 168, 
653-654. 

Peters, J., Nitsch, M., Kuhlmorgen, B., Golbik, R., Lupas, A., Kellermann, J., 
Engelhardt, H., Pfander, J. P., Müller, S., Goldie, K., Engel, A., Stetter, K.-O., 
and Baumeister, W. (1995). Tetrabrachion: A filamentous archaebacterial sur- 
face protein assembly of unusual structure and extreme stability. J. Mol. Biol. 
245, 385-401. 

Peters, J., Baumeister, W., and Lupas, A. (1996). Hyperthermostable surface layer 
protein tetrabrachion from the archaebacterium Staphylothermus marinus: Evidence 
for the presence of a right-handed coiled coil derived from the primary structure. 
J. Mol. Biol. 257, 1031-1041. 

Phillips, G. N., Jr. (1992). What is the pitch of the a-helical coiled coil? Proteins 14, 
425-429, Erratum in Proteins 17, 220. 

Phillips, G. N., Jr., Fillers, J. P., and Cohen, C. (1986). Tropomyosin crystal structure 

and muscle regulation. J. Mol. Biol. 192, 111-131. 


COILED-COIL STRUCTURE 77 


Samatey, F. A., Imada, K., Nagashima, S., Vonderviszt, F., Kumasaka, T., Yamamoto, M., 
and Namba, K. (2001). Structure of the bacterial flagellar protofilament and 
implications for a switch for supercoiling. Nature 410, 331-337. 

Seo, J., and Cohen, C. (1993). Pitch diversity in a-helical coiled coils. Proteins 15, 
223-234. Erratum in Proteins 17, 219. 

Singh, M., Berger, B., Kim, P. S., Berger, J. M., and Cochran, A. G. (1998). Computa- 
tional learning reveals coiled-coil like motifs in histidine kinase linker domains. 
Proc. Natl. Acad. Sci. 95, 2738-2743. 

Singh, M., Berger, B., and Kim, P. S. (1999). LearnCoil-VMF: Computational evidence 
for coiled-coil-like motifs in many viral membrane-fusion proteins. J. Mol. Biol. 290, 
1031-1041. 

Steinmetz, M. O., Stock, A., Schulthess, T., Landwehr, R., Lustig, A., Faix, J., Gerisch, G., 
Aebi, U., and Kammerer, R. A. (1998). A distinct 14 residue site triggers coiled-coil 
formation in cortexillin I. EMBO J. 17, 1883-1891. 

Stetefeld, J., Jenny, M., Schulthess, T., Landwehr, R., Engel, J., and Kammerer, R. A. 
(2000). Crystal structure of a naturally occurring parallel right-handed coiled coil 
tetramer. Nat. Struct. Biol. 7, 772-776. 

Strelkov, S. V., and Burkhard, P. (2002). Analysis of a-helical coiled coils with the 
program TWISTER reveals a structural mechanism for stutter compensation. 
J. Struct. Biol. 137, 54-64. 

Tarbouriech, N., Curran, J., Ruigrok, R. W. H., and Burmeister, W. P. (2000). Tetra- 
meric coiled coil domain of Sendai virus phosphoprotein. Nat. Struct. Biol. 7, 
777-781. 

Tripet, B., Wagschal, K., Lavigne, P., Mant, C. T., and Hodges, R. S. (2000). Effects of 
side-chain characteristics on stability and oligomerization state of a de novo- 
designed model coiled-coil: 20 amino acid substitutions in position "d" J. Mol. 
Biol. 300, 377-402. 

Trybus, K. M., Freyzon, Y., Faust, L. Z., and Sweeney, H. L. (1997). Spare the rod, spoil 
the regulation: Necessity for a myosin rod. Proc. Natl. Acad. Sci. 94, 48-52. 

Vinson, C. R., Hai, T., and Boyd, S. M. (1993). Dimerization specificity of the leucine 
zipper-containing bZIP motif on DNA binding: Prediction and rational design. 
Genes Dev. 7, 1047-1058. 

Wagschal, K., Tripet, B., Lavigne, P., Mant, C., and Hodges, R. S. (1999). The role of 
position ain determining the stability and oligomerization state of a-helical coiled 
coils: 20 amino acid stability coefficients in the hydrophobic core of proteins. 
Protein Sci. 8, 2312-2329. 

Walshaw, J., and Woolfson, D. N. (2001). SOCKET: A program for identifying and 
analyzing coiled-coil motifs within protein structures. J. Mol. Biol. 307, 1427-1450. 

Walshaw, J., and Woolfson, D. N. (2003). Extended knobs-into-holes packing in classi- 
cal and complex coiled-coil assemblies. J. Struct. Biol. 144, 349-361. 

Walshaw, J., Shipway, J. M., and Woolfson, D. N. (2001). Guidelines for the assembly of 
novel coiled-coil structures: a-Sheets and a-cylinders. Biochem. Soc. Symp. 68, 
111-123. 

Whitby, F. G., and Phillips, G. N., Jr. (2000). Crystal structure of tropomyosin at 7 Ä 
resolution. Proteins 38, 49-59. 

Whitby, F. G., Kent, H., Stewart, F., Stewart, M., Xie, X., Hatch, V., Cohen, C., and 
Phillips, G. N., Jr. (1992). Structure of tropomyosin at 9 Ä resolution. J. Mol. Biol. 
227, 441-452. 

Wilson, I. A., Shekhel, J. J., and Wiley, D. C. (1981). Structure of the haemagglutinin 
membrane glycoprotein of influenza virus at 3 A resolution. Nature 289, 366-373. 


78 LUPAS AND GRUBER 


Wolf, E., Kim, P. S., and Berger, B. (1997). MultiCoil: A program for predicting two- 
and three-stranded coiled coils. Protein Science 6, 1179-1189. 

Woolfson, D. N., and Alber, T. (1995). Predicting oligomerization states of coiled coils. 
Protein Sci. 4, 1596-1607. 

Wu, K. C., Bryan, J. T., Morasso, M. I., Jang, S.-I., Lee, J.-H., Yang, J.-M., Marekov, L. N., 
Parry, D. A. D., and Steinert, P. M. (2000). Coiled-coil trigger motifs in the 1B and 
2B rod domain segments are required for the stability of keratin intermediate 
filaments. Mol. Biol. Cell 11, 3539-3558. 

Yang, Z., Mochalkin, I., Veerapandian, L., Riley, M., and Doolittle, R.F. (2000). Crystal 
structure of native chicken fibrinogen at 5.5 A resolution. Proc. Natl. Acad. Sci. 97, 
3907-3912. 

Yang, Z., Kollman, J. M., Pandi, L., and Doolittle, R. F. (2001). Crystal structure of 
native chicken fibrinogen at 2.7 A resolution. Biochemistry 40, 12515-12523. 

Yonekura, K., Maki-Yonekura, S., and Namba, K. (2003). Complete atomic model of the 
bacterial flagellar filament by electron cryomicroscopy. Nature 424, 643-650. 

Zhang, L., and Hermans, J. (1993). Calculation of the pitch of the a-helical coiled coil: 
An addendum. Proteins 17, 217-218. 


THE DESIGN OF COILED-COIL STRUCTURES 
AND ASSEMBLIES 


By DEREK N. WOOLFSON 


Department of Biochemistry, School of Life Sciences, University of Sussex, 
Falmer BN1 9QG, United Kingdom 


I. Introduction to Protein Design... 80 
II. Rules for Coiled-Coil Design: The Basics of Coiled-Coil 

DEQUENCE and Structure esaa MoI alesse een Be 82 

A. Hydrophobic Interactions and Helix Formation ...................0.0eeeee eee 83 

B. van der Waals’ Forces, Steric Constraints, and Oligomer Specification...... 86 

C. Salt-Bridge Interactions and Partner Selection ..............222224202se seen 90 

D. Buried Polar Groups: The Icing on the Cake, 91 

III. Key Coiled-Coil Designs... 92 

As Parallel Structures’ aus na Bender 92 

B.' :Antiparallel'Struetüres: ran. on nn denne anna as 101 

IV... Summary u Mare res Mewelnoab haben sed d aa tee als 105 

References... usage sa na ann nenne Pre EL Ein gr 106 

ABSTRACT 


Protein design allows sequence-to-structure relationships in proteins to 
be examined and, potentially, new protein structures and functions to be 
made to order. To succeed, however, the protein-design process requires 
reliable rules that link protein sequence to structure/function. Although 
our present understanding of coiled-coil folding and assembly is not 
complete, through numerous bioinformatics and experimental studies 
there are now sufficient rules to allow confident design attempts of natu- 
rally observed and even novel coiled-coil motifs. This review summarizes 
the current design rules for coiled coils, and describes some of the 
key successful coiled-coil designs that have been created to date. The 
designs range from those for relatively straightforward, naturally observed 
structures—including parallel and antiparallel dimers, trimers and tetra- 
mers, all of which have been made as homomers and heteromers—to 
more exotic structures that expand the repertoire of Nature’s coiled-coil 
structures. Examples in the second bracket include a probe that binds a 
cancer-associated coiled-coil protein; a tetramer with a right-handed super- 
coil; sticky-ended coiled coils that self-assemble to form fibers; coiled coils 
that switch conformational state; a three-component two-stranded coiled 
coil; and an antiparallel dimer that directs fragment complementation of 
larger proteins. Some of the more recent examples show an important 
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development in the field; namely, new designs are being created with 
function as well as structure in mind. This will remain one of the key 
challenges in coiled-coil design in the next few years. Other challenges 
that lie ahead include the need to discover more rules for coiled-coil 
prediction and design, and to implement these in prediction and design 
algorithms. The considerable success of coiled-coil design so far bodes 
well for this, however. It is likely that these challenges will be met and 
surpassed. 


I. INTRODUCTION TO PROTEIN DESIGN 


Why bother with protein design? After all, nature has produced a wide 
variety of beautiful protein structures with fascinating functions. The 
author’s response to this question has three points. First, protein design 
provides the acid test of our understanding of the informational aspect of 
the protein-folding problem and, in particular, how protein sequence 
relates to the three-dimensional structure (and function) of proteins. 
Second, protein design attempts to capture the salient features of protein 
structure and function in simpler contexts. In other words, any natural 
protein sequence contains superimposed information about the protein’s 
folding, structure and stability, and its function(s). Protein design at- 
tempts to disentangle and use this information. Third, natural proteins 
represent only a tiny fraction of the possible protein sequence space 
available. Whether they also represent a fraction of the possible protein 
structure space is another question. It is not yet clear whether the natural 
protein structures observed so far, which suggest a limited number 
of protein folds (Liu et al., 2004a; Wolf et al., 2000), are in fact significantly 
greater. Whatever the case, protein design offers possibilities for exploring 
protein sequences, structures, and functions beyond those examined by 
nature. In turn, it presents a route to new structures and functions with 
potentially exploitable applications. 

There are a number of approaches to protein engineering and design, 
which, for the purposes of this review, are defined here. Protein engi- 
neering in general refers to the process of making one or a relatively 
small number of mutations in a natural protein framework to examine 
sequence-to-structure/function relationships in proteins, and/or to im- 
prove their structural properties or functions. At the other extreme, de 
novo protein design refers to attempts to construct totally new protein 
sequences with prescribed structures (and functions) from first principles. 
By protein redesign, this author refers to the process of mutating al- 
ready designed scaffolds to create new molecules with improved structural 
properties, stabilities, and functions. 
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Clearly, in de novo design the starting protein sequence is chosen by the 
designer. In protein engineering and redesign, there are a number of 
experimental routes to making mutants. These can be made one or a small 
number at a time, in which case the process is iterative. Alternatively, many 
mutations can be made simultaneously, either at specific sites in the 
scaffold (saturation mutagenesis) or randomly throughout the protein 
(random mutagenesis) to create a library of mutants from which variants 
with desired properties are selected. These methods are referred to as 
combinatorial or semirational. Finally, in both the rational and combina- 
torial approaches, the mutations can be guided by computational methods 
and searches. 

Two important concepts in de novo protein design are those of positive 
and negative design. In positive design, sequence-to-structure rules are 
used to direct the formation of and stabilize the target structure, whereas 
negative design refers to the idea of designing against (i.e., destabilizing) 
alternative and often competing structures (Beasley and Hecht, 1997; 
Hellinga, 1997; Hill e al, 2000). The application of these principles 
to coiled-coil structures is particularly important because, at least at 
first sight, coiled coils share a straightforward repeat sequence and the 
possibilities for forming the wrong quaternary structure are significant. 

So far, all references to designing protein functions have been bracketed. 
This is because the current state-of-the-art in protein design is at 
the structural level. In other words, we are better at designing protein 
structures than we are at introducing function. This is not to say that the 
redesign and de novo design of protein function is totally beyond us. In- 
deed, protein engineers have been modifying natural protein frameworks 
and their functions for around two decades. 

This review focuses almost exclusively on the rational design and rede- 
sign of coiled-coil motifs. Many reviews have been written on protein 
design in general. Recent papers that, in the view of this author, reflect 
the current state of the art in globular protein de novo design of struc- 
ture and redesign of function are from Baker (Kuhlman et al., 2003) and 
Hellinga (Dwyer et al., 2004; Looger et al., 2003), respectively. Discussion of 
the design and redesign of four-helix bundle proteins is limited. Although 
some four-helix bundles can also be considered as coiled coils, this is not 
always the case. Also, with the vagaries of design in the absence of full 
structure determination, this author considers it best not to group all four- 
helix bundles together with coiled coils at present. Finally, reviews on the 
design of four-helix bundles are available (Hill et al, 2000; Woolfson, 
2001). Regarding the design of coiled-coil motifs there have been a 
number of commentaries and review papers on this over the years (Cohen 
and Parry, 1990, 1994; Kohn and Hodges, 1998; MacPhee and Woolfson, 
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2004; Micklatcher and Chmielewski, 1999; Oakley and Hollenbeck, 2001; 
Schneider et al., 1998). Finally, in addition to Chapters 3 and 4 in the 
volume on coiled-coil sequences and structures, there have been a number 
of reviews of the coiled-coil field in general (Burkhard et al., 2001; Gruber 
and Lupas, 2003; Lupas, 1996). 

The review is laid out as follows: in Part II, there is a discussion of 
the structural features of the coiled-coil assemblies and the current state- 
of-the-art of sequence-to-structure rules pertinent to design; in Part III, 
there is a detailed description of some of the key coiled-coil designs, 
including the rules used to create them and their significance in the field; 
and, finally, there is a discussion of the potential for new coiled-coil 
designs and where the field might go in the future. 


II. RULES FOR COILED-CoIL DESIGN: THE BASICS OF 
COILED-COIL SEQUENCE AND STRUCTURE 


Any design and engineering process requires an understanding of how 
to assemble the basic building blocks available into the objects being 
targeted. Though it is not always absolutely necessary, ideally this under- 
standing should be at as fundamental a level as possible. Protein design 
and assembly are no different. Here, we refer to rules that link protein 
sequence and structure. These rules can be general, such as “place 
hydrophobic amino acids alternately three and four residues apart to 
direct the folding and assembly of amphipathic a-helices,” or a little more 
specific, such as "make every second hydrophobic amino acid Leu to 
guide the assembly of dimers.’’ These are examples of rules of thumb 
that allow protein designers to build up a protein sequence compatible 
with a desired target structure. They can be used manually or built into 
algorithms to be implemented computationally. In many cases, these rules 
work very well and, if engineering was the only goal of protein design, that 
might be the end of it. However, it is much more powerful, and ultimately 
more intellectually satisfying, to understand the fundamental basis of 
the rules. In terms of protein design, the chemical level of understand- 
ing suffices (i.e., how the sequence-to-structure rules can be rationalized 
in terms of the underlying noncovalent forces). For this reason, the 
current state-of-the-art of the rules for coiled-coil design is presented 
below in terms of the noncovalent forces that direct coiled-coil folding 
and assembly. 

For a more thorough description and discussion of coiled-coil struc- 
tures, the reader is referred to Chapter 3. For the purposes of this 
review, coiled coils are described as canonical or noncanonical. Canonical 
coiled coils are those based on tandem heptad sequence repeats that form 
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right-handed amphipathic a-helices, which then assemble to form helical 
bundles with left-handed supercoils. In contrast, noncanonical a-helical 
coiled coils are built from nonheptad-based repeats and, as a conse- 
quence, do not necessarily form coiled coils with left-handed or even 
regular supercoils. For instance, 11-residue repeat sequences give rise to 
right-handed coiled coils. The majority of this review focuses on designs 
made using canonical, heptad-based coiled coils as their starting points. 


A. Hydrophobic Interactions and Helix Formation 


As with most, if not all, biological self-assembly processes, a key driving 
force in coiled-coil folding and assembly is the hydrophobic effect. Put 
simply, the hydrophobic effect is the phenomenon that hydrocarbon 
and water tend to phase separate. Thus, when in the context of a 
biological, aqueous buffer, biological macromolecules fold or self-assem- 
ble to minimize the hydrophobic surface area in contact with bulk solvent. 
In the case of proteins, which can be considered as essentially linear 
heteropolymers of hydrophobic (H: Ala, Phe, Ile, Leu, Met, Val, Trp, 
and Tyr) and polar (P: Asp, Glu, His, Lys, Asn, Gln, Arg, Ser, and Thr) 
residues, the polypeptide chain folds to bury Hresidues and expose P side 
chains. (In these terms, there is also a third category of amino acid 
residue, which can be designated as special IS Cys, Gly, and Pro]. These 
amino acids confer unusual conformational properties on the polypeptide 
chain. However, for reasons outlined in the main text, these residues 
occur less frequently in coiled-coil sequences compared with protein 
sequences as a whole.) Therefore, to a first approximation, the structures 
arrived at through this reaction to the aqueous medium reflect the pattern 
of Hand P residues along the primary sequence. For instance, alternating 
patterns of Hand P residues (HPHP...) tend to adopt -strand conforma- 
tions and lead to (-sheet-based structures. This is because alternate resi- 
dues along a -strand point in opposite directions out from the strand, so 
an alternating HP pattern produces an amphipathic (-strand (i.e., a strand 
with the H residues sequestered on one side and the P residues on the 
other). In turn, two or more such strands come together to form an 
amphipathic (-sheet, which can pack into a globular structure to bury 
its hydrophobic face. 

As the a-helix has 3.6 residues per turn, hydrophobic side chains spaced 
at combinations of three and four residues apart are required to make an 
amphipathic structure. For helices in globular proteins, a variety of com- 
binations of three-plus-four spacings of hydrophobic residues are observed 
(Chothia et al., 1981). This leads to a range of helix-helix packing angles 
and arrangements. However, to form more persistent fibrous coiled-coil 
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structures, more regular HP patterns are required. The canonical heptad 
pattern, HPPHPPP, seems to predominate in nature. The full reasons 
for this are not clear but, as described below, it is likely that it leads to 
the most efficient way of packing two or more helices in a fibrous helical 
bundle. 

The HPPHPPP pattern is usually denoted abcdefg, with a and d assig- 
ned to the H residues, and can be pictured in terms of a helical wheel 
(Fig. 1C and D). Tandem heptad repeats along a polypeptide chain give 
an average separation between H residues of 3.5 residues. As this falls short 
of the 3.6-residues per turn of a regular a-helix, the a plus d hydrophobic 
face tracks around the helix with its own helical pitch, which is slighter and 
in the opposite direction of the backbone a-helix. Thus, when two or more 
helices pack together to form a coiled-coil oligomer, they do not pack 
straight as suggested by standard helical wheel diagrams, but wrap around 
one another in order to maximize contacts between the hydrophobic 
surfaces (Fig. 1A and D). The sense of this so-called supercoiling is the 
same as the hydrophobic stripe. Thus, in the case of a canonical coiled 
coil, the helices pack with a left-handed supercoil. 

In terms of coiled-coil design several questions emerge from this rudi- 
mentary structural analysis. First, what is the minimum number of contig- 
uous heptad repeats required to achieve a stably folded coiled-coil 
structure? The full answer to this is a little involved because for some 
natural coiled-coil sequences “trigger”’ sequences are believed to be 
essential for folding, even for very long sequences (Burkhard et al., 
2000a; Kammerer et al., 1998; Lee et al., 2001; Steinmetz et al., 1998; 
Walshaw and Woolfson, 2001b). However, for design purposes, three 
or four heptad repeats appear to be sufficient for stable coiled-coil 
folding (Lumb et al., 1994; Su et al, 1994; Talbot and Hodges, 1982), 
though, as discussed below, a two-heptad design has been described, albeit 
with poor oligomer-state specificity (Burkhard et al., 2000b; Meier et al., 
2002). 

Second, what type of residues are best at the a and d sites? This is also 
a difficult question to answer directly because, as will be addressed in 
more detail over the next few sections, the nature of residues at these sites 
influences coiled-coil stability, oligomer state, partner selection, and helix- 
helix orientation (Table I). However, in general terms, natural coiled-coil 
sequences tend to use the aliphatic hydrophobic residues (Ala, Ile, Leu, 
Met, and Val) at these positions, rather than the aromatic hydrophobic 
side chains (Phe, Trp and Tyr) (Parry, 1982; Woolfson and Alber, 1995). 
The reason for this is probably a combination of bulk and steric con- 
straints presented by the aromatic residues. However, a thorough under- 
standing of the possible exclusion of aromatic side chains from coiled-coil 
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Fic. 1. (A) A dimeric coiled coil [pdb identifier, 2zta; (O’Shea et al., 1991)] (B) An 
antiparallel, two-stranded coiled coil [pdb identifier, lsry; (Fujinaga et al., 1993)]. The 
latter has been cropped to a region of similar length to the leucine-zipper peptide. 
These figures were generated using Molscript (Kraulis, 1991). (C and D) The heptad- 
repeat sequence abcdefg mapped onto these two types of structure (i.e., parallel 
and antiparallel two-helix structures, respectively). Reproduced from Pandya et al. 
(2004). 


cores awaits further protein-engineering experiments in which multiple 
aromatic residues are introduced at aand d. Note added in proof: one such 
experiment has now been done (see Liu et al., 2004b), and an a = d = Trp 
peptide forms a pentamer. 

Third, what residues are best at the remaining b, c, e, f, and g sites? 
Again, this will be addressed in more detail below. However, in general, 
these positions are more permissive than the a and d sites, though polar 
and helix-favoring residues (Ala, Glu, Lys, and Gln) tend to be favored 
both by nature and by protein designers for these positions. 
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TABLE I 
Preferred Amino Acids at the a and d Sites of Known Coiled-Coil Structures 


Most preferred amino acids 


Structural type a d 
Parallel two-stranded Asn > Val > Leu > Ile (1.77) Leu > Met 
Parallel three-stranded Ile > Val > Leu (1.91) Leu >> Ile 
Parallel four-stranded Met & Ile > Val > Leu > Ala Met > Ile > Gln 
Parallel five-stranded Leu > Ile Met > Gln > Thr > Val 
Antiparallel two-stranded Leu & Ile Leu 


These residues occurred twice or more often than expected by chance at the 
SOCKET-assigned heptad positions of structurally verified coiled-coil motifs in 
the Protein Data Bank (Walshaw and Woolfson, 2001b). These data are taken from 
amino acid profiles that were generated by normalizing the raw count data by the 
observed frequencies of each amino acid in the SWISSPROT database. The bracketed 
figures after some of the entries give the normalized frequency for that amino acid 
at that position; in these cases, the frequencies fall below the nominal cutoff of 
2 for automatic inclusion in the table, but the author feels that these particular cases 
should be included for reference to the main text. The full profiles can be found at: 
http://www.biols.susx.ac.uk/Biochem/Woolfson/html/coiledcoils/cccat/mulal/stats/ 
collated_tally_norm_rep.html 


Furthermore, regarding the special residues Cys, Gly, and Pro, these 
occur with quite low frequencies in coiled-coil proteins (Conway and 
Parry, 1990, 1991; Lupas et al., 1991; Parry, 1982; Woolfson and Alber, 
1995) and tend only to be used to perform very specific roles in de novo 
coiled-coil designs. Gly and Pro are regarded as a-helical breakers and so 
are rarely chosen by designers for the central regions of coiled-coil struc- 
tures. Cys can form disulphide bridges or be otherwise oxidized. It is 
usually avoided in design sequences unless it is specifically required for 
cross-linking polypeptide chains. 


B. van der Waals’ Forces, Steric Constraints, 
and Oligomer Specification 


One property of hydrophobic interactions is that they tend not to be 
specific. In other words (and as evident in natural coiled-coil structures), 
the basic coiled coil pattern HPPHPPP is compatible with a number of 
helix-bundle quaternary structures. Dimeric, trimeric, tetrameric, penta- 
meric, and dodecameric coiled coils are all known as are homomeric and 
heteromeric complexes and topologies with parallel, antiparallel, or 
mixed arrangements of helices (Burkhard ei al., 2001; Gruber and Lupas, 
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2003; Lupas, 1996; Walshaw and Woolfson, 2001a,b). Again, the reader is 
referred to Chapter 3 for a full description of coiled-coil structures. 

Thus, the question in coiled-coil prediction and design is: what specific 
replacements are superimposed on the basic HPPHPPP pattern to direct 
the functional oligomerization state? This question was first tackled by 
Conway and Parry, who analyzed natural coiled-coil sequences that formed 
dimers and trimers (Conway and Parry, 1990, 1991). Woolfson and Alber 
(1995) advanced this approach by comparing amino-acid profiles for these 
two structures directly. The work that made the biggest impact on this 
issue, however, was the collaborative experimental study from the Kim and 
Alber laboratories using the GCN4 leucine-zipper peptide model system 
and mutants thereof. 

The first experimental study was the high-resolution X-ray crystal struc- 
ture of the leucine zipper peptide, GCN4-p1, itself (O’Shea et al., 1991), 
which revealed that the packing geometries of residues at a and d were 
different. The significance of this finding for oligomer state selection be- 
came clearer with Harbury’s studies of hydrophobic core mutants of the 
same peptide (Harbury et al., 1993, 1994). As demonstrated by several 
protein engineering studies (Woolfson, 2001), hydrophobic residues within 
the cores of globular proteins tend to be very tolerant of substitution by 
other hydrophobic side chains. This is not the case for multichain coiled-coil 
structures. GCN4-p1, which forms dimers exclusively, has Leu at all four d 
positions, and 1 x Met, 1 x Asn, and 3 x Val at its a sites. Harbury has 
described multiple core mutants of GCN4-p1 in which all aresidues (except 
Met-2) were simultaneously substituted for one of Ile, Leu, or Val, and all d 
residues were simultaneously substituted for one of same subset of aliphatic 
side chains (Harbury et al., 1993). The peptides were generically named 
GCN4-p-ad. For example, GCN4p-IL refers to the GCN4-p1 sequence with 
said a and d sites replaced by Ile and Leu, respectively. Seven of the nine 
possible peptides have been characterized. The comparison of peptides 
p-IL, p-II, and p-Ll is interesting as they form dimers, trimers, and tetramers, 
respectively. Structures for the new trimeric (Harbury et al., 1994) and 
tetrameric (Harbury et al., 1993) forms are available and these have helped 
rationalize the different oligomer state selections on the basis of different 
packing arrangements made within the cores of the structures. 

As O’Shea and colleagues (1991) have noted, the packing of side chains 
at the aand dsites of the GCN4-p1 dimer is different. At the d position, the 
a— bond vector of the side chain points into the interface and directly 
towards the neighboring helix. This type of geometry, called perpendicu- 
lar packing by Harbury, precludes (-branched residues such as Ile and Val 
from occupying these sites and favors Leu. This is the reason that the 
leucine zipper is a leucine zipper: the hallmark of the bZIP transcription 
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Fic. 2. Schematic diagrams (left) and experimental examples (right) for different 
core-packing geometries. (A) Parallel packing, showing an a-layer from the GCN4 
leucine zipper (O’Shea et al., 1991). (B) Perpendicular packing at a dlayer in the same 
structure. (C) Acute packing at a dlayer in a structure of a trimeric GCN4 mutant 
(Gonzalez et al., 1996c). Reproduced from Walshaw and Woolfson (2001b). 


factors is a run of four or so leucine residues spaced seven residues apart, 
and in a heptad repeat these fall at the d sites (Landschulz et al., 1988). At 
the a sites, however, the a—@ bond vector points out from the helical 
interface, and is termed parallel packing. As a result, the a site of dimeric 
coiled coils is much more permissive of amino acid substitution (Conway 
and Parry, 1990; Hu et al., 1990; Woolfson and Alber, 1995). In hydropho- 
bic side chains, the (-branched residues Ile and Val are favored at a 
because they contribute hydrocarbon back into the helical interface. 
These different geometries for the dimer case are shown in Fig. 2. In 
Harbury’s p-LI tetramer, the packing geometries are reversed compared 
with the dimer. The packing at a is perpendicular, whereas the packing at 
d is parallel. This fits with the swap of Ile and Leu residues at a and d 
between the p-IL dimer and the p-LI tetramer. 

In summary, the relative order of the -branched residues and leucine 
at a and d has a major influence on oligomer state selection. The rule is 
that Leu prefers perpendicular packing whereas the (@-branched residues, 
Ile and Val, prefer parallel packing (Table I). On this basis, it is of little 
surprise that a retro-GCN4-p1 sequence (i.e., with a = Leu and d = Ile) 
forms a tetramer and not a dimer (Mittl et al., 2000). 
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So what happens in the trimer? GCN4-p-II crystallizes as a trimer 
(Harbury et al., 1994). In this case, the packing at a and dis closely similar: 
it is intermediate between the perpendicular and parallel extremes 
(Fig. 2), and accordingly is referred to as acute packing. As a result, 
less selectivity for amino-acid type might be expected between the 
two sites. Indeed, this largely fits with Harbury’s experimental studies 
and with analyses of natural trimeric coiled-coil sequences (Conway and 
Parry, 1991; Woolfson and Alber, 1995) (Table I). Taking these studies 
together, the rule of thumb of a = d = Ile or Leu to specify trimers 
generally emerges. However, one further point is worth making here: the 
trimer state seems to be the default, and in the absence of strong sequence 
determinants of oligomer state, coiled-coil peptides and proteins will tend 
to form this structure. This is apparent from a number of studies using the 
GCN4-p1 background. First, Harbury’s peptides with all a = Val tend to 
form mixed oligomer states in solution (Harbury ei al., 1993). In addition, 
substitutions that replace the central a site, which is Asn in the wild-type 
sequence, form mixtures of oligomer states (Gonzalez et al., 1996a,b,c). 
These results suggest that, at least for the a position, Val specifies 
the oligomer state very poorly compared with Ile. However, as described 
below, Val in combination with certain polar residues at a, as in wild-type 
GCN4-pl, does specify the dimer (Gonzalez et al., 1996c; Lumb and Kim, 
1995; Woolfson and Alber, 1995). 

Walshaw and Woolfson (2001b) have analyzed the packing geometries 
of all coiled-coil structures identified in the Brookhaven Protein Data 
Bank (release #87), and have largely confirmed Harbury’s experimental 
analysis. Furthermore, their work provided updated and structurally 
validated amino-acid profiles for the main coiled-coil quaternary struc- 
tures. For instance, in parallel dimers the rule of d = Leu combined with 
a = Ile or Val, together with rules for polar residues at a (see below), 
largely holds up (Table I). These rules are, of course, rules of thumb and 
considerable variation is observed in natural sequences. Nonetheless, they 
are extremely powerful for specifying the oligomer state of designed 
coiled-coil peptides with confidence. Furthermore, several protein engi- 
neering and redesign studies have reproduced these observations and 
put measures on the thermodynamic effects of making certain hydro- 
phobic mutations at the core a and d sites (Moitra et al., 1997; Tripet 
et al., 2000; Wagschal et al., 1999). More recently, researchers have begun 
to use computational methods (Havranek and Harbury, 2003; Keating 
et al., 2001) and to incorporate nonnatural amino acids to probe, under- 
stand, specify, and stabilize coiled-coil hydrophobic interfaces (Bilgicer 
et al., 2001; Schnarr and Kennan, 2001, 2002, 2003; Yoder and Kumar, 
2002). 
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Before leaving the subject of the hydrophobic component of coiled-coil 
interfaces, it is worth noting that hydrophobic residues at e and g and 
possibly at b and c also influence oligomer state selection. In coiled-coil 
sequences, the frequencies of hydrophobic residues at these sites in- 
crease with oligomer state (Conway and Parry, 1990, 1991; Walshaw and 
Woolfson, 2001b; Woolfson and Alber, 1995). Likewise, in coiled- 
coil structures these sites are increasingly buried and involved in the 
interface (Harbury et al., 1993; Kajava, 1996; Walshaw and Woolfson, 
2003). Though these observations need further experimental testing 
and verification, potentially they provide a further route to designing 
coiled coils of different oligomer states. 


C. Salt-Bridge Interactions and Partner Selection 


It has long been appreciated that residues that flank the main hydro- 
phobic seam of the coiled-coil interface (i.e., residues at eand gand, more 
latterly, those at band c) are likely to contribute to coiled-coil specificity 
and stability (McLachlan and Stewart, 1975; McLachlan et al., 1975). 
Specifically, in parallel coiled-coil structures, the g site of one helix is 
brought close to the e site of the successive heptad of a neighboring helix 
(Fig. 1C) in a so-called g,:e„+1’ interaction. As will be discussed later, in 
antiparallel structures the analogous interactions are e:e’ and g:g’ (Fig. 
1D). This difference may be important in distinguishing between parallel 
and antiparallel structures in nature, and it also provides the protein 
designer with excellent positive and negative design rules. 

At least from the analyses of amino-acid profiles for parallel dimers 
and trimers (Conway and Parry, 1990, 1991; Lupas et al., 1991; Walshaw 
and Woolfson, 2001b; Woolfson and Alber, 1995), it is beyond doubt that 
the placement of oppositely charged side chains at these sites must play a 
part in coiled-coil assembly, structure, and stability. Likewise, the proximi- 
ty of oppositely charged side chains emanating from these sites in experi- 
mentally determined structures and the postulated resulting salt bridges 
add further compelling evidence for the importance of such interactions 
(O’Shea et al., 1991). Nonetheless, and despite many experimental studies 
to probe these interactions, the issue remains controversial (Lavigne et al., 
1996; Lumb and Kim, 1996). Specifically, the question is whether Gigi 
interactions stabilize and specify coiled-coil structures, or if they simply 
help specify partner selection and helix orientation. 

Although this may seem like dodging an important issue—clearly in 
protein design it is important to understand how protein sequence relates 
to both protein structure and stability—for the purposes of this review, it is 
an academic question. This is for two reasons, which will become clear in 
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the examples of successful coiled-coil designs. First, stability tends not to 
be an issue when designing coiled coils, as plenty can be gained from the 
core interactions alone. Second, through the elegant studies of O’Shea 
and colleagues on the Fos:Jun system, it is clear that charge-charge com- 
plementarity in coiled-coil interfaces plays a significant role in partner 
selection (O’Shea et al., 1989, 1992). 

On this basis, coiled-coil designers often place oppositely charged re- 
sidues Lys and Glu at complementary gand esites to direct the assemblies 
that they are targeting (positive design), and make similar potential 
interactions in alternative structures repulsive (negative design). 


D. Buried Polar Groups: The Icing on the Cake 


Returning to the a and d sites of coiled-coil interfaces, they are not the 
exclusive province of hydrophobic residues: polar residues are found here 
and, in many cases, they are highly conserved. In retrospect, this is also 
apparent in the amino-acid profiles of dimeric and trimeric coiled coils 
(Conway and Parry, 1990, 1991; Lupas et al., 1991). Furthermore, inspec- 
tion of these profiles reveals trends in the data; for instance, basic residues 
occur frequently at the a sites of dimeric coiled coils (Conway and Parry, 
1990). The real importance of such inclusions, however, only became 
apparent with the determination of the structure of the leucine-zipper 
peptide GCN4p1 (O’Shea et al., 1991). 

Position 16 of GCN4-pl—an a site—is Asn, which is conserved across 
many leucine-zipper sequences. In the structure, this residue is buried and 
makes an asymmetric side chain-side chain hydrogen bond with the 
corresponding Asn in the partner strand. This Asn-Asn’ pair appears to 
play a number of roles, some of which have been subsequently verified 
experimentally. First, the inclusion of the pair in the otherwise hydropho- 
bic core is destabilizing, and replacement of the residue by the canonical 
Val increases coiled-coil stability considerably (Harbury et al., 1993). There 
may be biological significance to this in, for instance, modulating dimer 
stability for control and protein turnover in the cell. However, what is 
more dramatically obvious in the Asn16Val mutant is its loss of oligomer- 
state specificity: it forms mixtures of oligomers (Harbury et al., 1993). 
Thus, second, Asn-16 is essential for specifying the dimer state of GCN4 
pl. Third, the residue may be important in destabilizing other alternative 
structures, such as antiparallel and out-of-register structures, both of which 
would result in two buried and noncomplemented Asn side chains. 

The importance of Asn at a in specifying the dimer is further demon- 
strated by the comparison of amino-acid profiles for dimeric and trimeric 
coiled coils (Woolfson and Alber, 1995). In the normalized profiles, Asn at 
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a is significant in dimers, but not in trimers. Similarly, and following on 
from the earlier studies of Conway and Parry (1990), the profiles from 
Woolfson and Alber show that the basic residues Lys and Arg occur 
significantly at the a sites of dimers, but not in trimers. With regard to 
promoting the trimer the residues Gln, Ser, and Thr are all favored at the 
a sites of trimer sequences, but not in dimers. A number of experimental 
studies lend support to these correlations, and provide structural ratio- 
nales for the inclusion and accommodation of polar residues in coiled-coil 
interfaces (Akey et al., 2001; Campbell and Lumb, 2002; Campbell et al., 
2002; Eckert et al., 1998; Gonzalez et al., 1996c; Lumb and Kim, 1995; 
Oakley and Kim, 1998). For instance, inclusion of Lys at a appears to 
specify dimers, not because of specific interhelix interactions in this state, 
but because the charge side chain cannot escape the core of higher-order 
states such as the trimer, which are therefore compromised (Gonzalez 
et al., 1996c). 

In summary, the inclusion of the polar residues, even when hydrogen- 
bonding potentials are satisfied, is destabilizing. However, though the 
burial of a polar residue may destabilize the target structure, it may 
destabilize alternative structures even more (Gonzalez et al., 1996c). In 
this respect, the use of buried residues provides a very powerful negative- 
design principle for coiled-coil design. 


IHI. Key CoILED-CoIL DESIGNS 


The author recognizes that some readers will find it difficult to visualize 
the sequences given below in the context of coiled-coil structures. For this 
reason, it is suggested that readers consult the original articles, nearly all 
of which give annotated helical-wheel diagrams for the design sequences 
or, preferably, construct their own annotated helical wheels using the 
sequences given and blank diagrams of the type shown in Fig. 1C and D. 


A. Parallel Structures 


1. The Original Hodges Design: An 86-residue Analog of Tropomyosin (1981) 


To the author’s knowledge, Hodges and colleagues presented the first 
discrete coiled-coil peptide of de novo design (Hodges et al., 1981). This 
design followed earlier studies by the group on polymers of the heptapep- 
tide repeat sequence KLESLES. The design principles, which are based on 
an analysis of the amino-acid composition of tropomyosin, are clear and 
were well ahead of their time. The peptide comprised three blocks, 
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A (KCAELEG), B (KLEALEG), and C (KLEALEGK), assembled in the 
order AB,C to give a 43 residue peptide, which was air-oxidized through 
the N-terminal Cys to form an 86-residue covalent dimer. The main features 
of this design were the straightforward a= d= Leu core, the complementary 
gel Lys:Glu pairs, and the unusual Gly residues at f, which were included for 
synthetic reasons. Note that the heptad repeat is denoted gabcdef in 
this design. The dimer exhibited considerable helicity and thermal stability 
consistent with a fully folded coiled coil. This original design has spawned 
considerable work in coiled-coil design and redesign from the Hodges 
group (Kohn and Hodges, 1998) and others (Schneider et al., 1998). 


2. Coil-Ser: A Peptide Designed as Parallel Homodimer that 
Forms an Up-Up-Down Homotrimer (1990) 


O’Neil and DeGrado (1990) adapted the original Hodges sequence to 
engineer a coiled-coil-based host-guest system for determining a thermo- 
dynamic scale of the helix-forming tendencies of the amino acids. Based 
on Hodges’ building block—KLEALEG—they described a four-heptad 
peptide. Some significant changes were made, however. The first aposition 
and the last fsite were Trp and His, respectively, to facilitate spectroscopic 
characterization. Also, the glycine residues at the f sites of first and third 
heptad were replaced by Lys to improve helical propensity, and the re- 
maining f position of the second heptad was used as the guest site and 
substituted by all 20 amino acids plus Aib. The stabilities of the resulting 
21 peptides were determined by urea denaturation experiments. These 
showed a striking correlation with statistical measures of a-helix propensity. 

The determination of the crystal structure of coil-Ser provided an 
interesting twist on this story (Lovejoy et al., 1993). Rather than the parallel 
dimer expected, coil-Ser formed an up-up-down homotrimer (Fig. 3). 
Trimerization in solution was also confirmed by analytical ultracentrifuga- 
tion. The likely reasons for this were the predominantly all-Leu core, and 
the inclusion of Trp at the first a position. As noted above, the a= d core 
favors trimers, and Trp is too bulky for all three Trp residues to be 
accommodated within one coiled-coil layer, hence the switch of one helix 
to give the up-up-down arrangement. As noted by DeGrado and collea- 
gues, this result did not affect the scale of helix-forming tendencies 
originally reported using the coil-Ser system (Lovejoy et al., 1993). 


3. Peptide Velcro: A Parallel Heterodimer (1993) 


The “Peptide Velcro” design by O’Shea et al. (1993) was based on their 
experience with the GCN4-pl and Fos/Jun systems. The core design— 
which preceded Harbury’s work on oligomer-state selection (Harbury ei al., 
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Fic. 3. The ribbon structure of coil-Ser [pdb identifier, 1cos; (Lovejoy et al., 1993)]. 
The structure reveals a mixed parallel/antiparallel three-helix bundle, which is 
influenced by the three core tryptophan side chains, shown in space-filling 
representation. 


1993)—was influenced by the foregoing Hodges and DeGrado designs 
and was straightforward, with four a = d = Leu heptads. However, the 
system did not form trimers because of the GCN4-inspired inclusion of 
Asn at the central a position (O’Shea ei al., 1991). In one partner, Acid-pl, 
Glu were placed at all e and g sites while its complement, Base-pl, had Lys 
at these positions. Thus, the idea of the design was to draw the two 
peptides together with a full set of complementary g,:¢,,)/ pairings be- 
tween the partner peptides. Consistent with the design principles, the 
isolated peptides did not fold in water but, when mixed, formed a stable 
1:1 complex that is fully helical and unfolded cooperatively. The dissocia- 
tion constant for this design is an impressive 30 nM (20°C, pH 7). 
Additional experiments revealed that, as in the Fos/Jun system, repulsive 
electrostatic interactions in the alternative homodimers direct heterospe- 
cificity more than the favorable electrostatic interactions in the targeted 
heterodimer. 
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Interestingly, when the buried Asn was replaced by Leu, structural 
specificity was lost and the peptides formed a heterotetramer without 
fixed helix-helix orientations (Lumb and Kim, 1995). In a further report 
on this system, Sia and Kim (2001) showed that the two copies of Acid-pl 
in the heterotetramer could be replaced by a peptide, D-Acid, in which all 
of the residues were made from D-amino acids. This clever redesign was 
guided by helical-net diagrams that considered core packing in the D/L 
structure. These revealed a:d’ rather than a:a’ and d:d' layers in the core, 
which were accommodated in the new design by a half-heptad shift of a 
redesigned L-Base sequence. 

While on the topic of heterodimer design, Vinson and coworkers have had 
considerable success in making heterodimerizing leucine zippers with an 
impressive range of properties, including very tight binding (Moll et al., 2001; 
Vinson etal., 1993). Hodges and colleagues have presented a series of related 
designs, E/K coils, which they propose for use in biotechnology as, for 
example, capture reagents for affinity chromatography and biosensor-based 
applications (Chao et al., 1998; Litowski and Hodges, 2001, 2002). 


4. ABC: A Parallel Heterotrimer (1995) 


The ABC design of Nautiyal et al. (1995) represented a step up in com- 
plexity and a considerable increase in the degree of difficulty over foregoing 
heterodimeric coiled-coil designs. Again, this was a four-heptad design with 
a straightforward a= d= Ile core and a single polar residue (Gln) at aas a 
failsafe for parallel trimer formation. With only two charges to play with and 
three helical interfaces to specify, the selection of e and g residues used an 
algorithm to choose from more than 16 million peptide combinations. The 
solution maximized salt bridges in the targeted parallel heterotrimer, and 
minimized them in all alternative combinations. In this respect, the ap- 
proach used both positive and negative-design principles. Two solutions to 
the problem were returned: the one synthesized, and its mirror image 
reversed the order of helical packing. One failing of the design was that 
the individual peptides did trimerizes, as did the binary combinations. 
However, none of these combinations matched the heterotrimer in thermal 
stability. An important and impressive aspect of this design was that its crystal 
structure confirmed all features of the design, including the correct A-B-C 
helical packing arrangement (Nautiyal and Alber, 1999). 


>. A Heterotetramer: Extending the Coiled-Coil Interface (1996) 


Based on the natural sequence of the core domain of the Lac repressor, 
which tetramerizes, Fairman et al. (1996) have engineered two 21-residue 
peptides, Lac21E and Lac21K. In isolation, the peptides did not fold 
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appreciably, but when combined they formed thermostable tetramers. In 
these short sequences, the d residues were Leu, and the three a sites 
were Met, Leu, and Val, respectively. Interestingly, the e and g positions 
were combinations of Ala, Glu, Gly, and Ser. This left specifying the 
structure down to charged residues at the band c positions, which were 
all Glu in Lac21E and all Lys in Lac21K. This idea of extending the coiled- 
coil interfaces beyond the a and d seam and the e and g flanking residues 
to b and c has been noted elsewhere for coiled-coil structures (Harbury 
et al., 1993; Kajava, 1996; Walshaw and Woolfson, 2003) and sequen- 
ces (Conway and Parry, 1991; Walshaw and Woolfson, 2001b; Woolfson 
and Alber, 1995). 


6. Anti-APCp1: A Coiled-Coil Probe for the APC Tumor 
Suppressor Protein (1998) 


Mutations of the human APC gene are associated with both sporadic 
and familial forms of colon cancer. The APC protein is a large, multi- 
domain protein that has a 55-residue, N-terminal dimeric coiled coil (APC- 
55). Alber and colleagues used rules of thumb and those derived from an 
analysis of the covariation of a:d and dd pairs in the cytokeratins (which 
form obligate heterodimers) to create a mutant of APC-55, anti-APCp1, as 
a potential probe for the APC protein (Sharma et al., 1998). 

In all, 20 mutations were described that can be grouped as follows. First, 
five changes were made at the a and d sites based on the analysis of 
the keratins. Second, eight changes were made at the e and g sites to 
introduce salt bridges to favor the anti-APCp1:APC-55 heterodimer and 
to destabilize homodimerization of anti-APCpl. Finally, seven changes 
were made at the e, f, and g positions to improve the charge and the 
helicity of the peptide and to introduce chromophores. Although anti- 
APCpl did form stable oligomers, these were higher orders than the 
dimer. Moreover, when mixed with APC-55, a stabilized heterodimer was 
formed as judged by solution-phase biophysics and gel electrophoresis. In 
addition, the peptide probe pulled down wild-type and mutant forms of 
APC from extracts of cancer cell lines. 


7. RH: Designed Right-Handed Coiled Coils (1998) 


Harbury et al. (1998) described a series of peptides designed to form 
dimeric, trimeric, and tetrameric helical bundles with right-handed super- 
coils. To achieve right-handed coiled coils, rather than canonical left- 
handed structures, an HP pattern based on an 11-residue abcdefghij repeat 
was used as a template. Combinations of the hydrophobic residues Ala, Ile, 
Leu, Val, allo-Ile, and nor-Val were considered for the a, d, and h sites. The 
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Fic. 4. (A) An end view of the GCN4-pLI (Harbury et al., 1993) tetramer back- 
bone showing the left-handed supercoil. (B) Similar projection for RH4 (Harbury et al., 
1998) showing the slighter supercoil in the opposite sense. 


novel inclusion of the nonstandard residues was based on preliminary 
modeling studies. A computer algorithm was written to choose optimal 
combinations of these residues for the targeted dimer, trimer, and tetramer 
states. An additional novel feature of the design protocol, which advanced 
protein design in general, was the inclusion of some backbone flexibility in 
the modeling algorithm. Together, these novel features allowed the se- 
quence of a, d, and hsites to be optimized for the dimer, trimer, and tetramer 
states. Solution phase characterization of three corresponding peptides— 
RH2, RH3, and RH4—confirmed the designs as stable, cooperatively 
folded helical bundles of the correct oligomer state, though RH2 was only 
marginally stable. Most importantly for the confirmation of the design 
process, the structure of RH4 was described. This revealed a right-handed 
coiled-coil structure with core packing and other features that matched 
the design model in atomic detail. The structure of RH4 is compared with 
the left-handed coiled-coil tetramer GCN4-p-LI in Fig. 4. 

On the theme of noncanonical coiled coils, Hicks et al. (2002) have 
investigated the effects of a range of inserts—of between one and 
seven Ala residues—at the center of an otherwise canonical designed four- 
heptad leucine-zipper system. All of the inserts were destabilizing, but the 
four-residue insert, which results in a heptad-hendecad-heptad-heptad 
(7-11-7-7) sequence, was the least destabilizing. Similarly, though for differ- 
ent reasons, Hodges and colleagues have also made and characterized 
canonical coiled coils as hosts for peptide inserts (Kwok et al., 2002). 


8. SAF: An Offset Heterodimer that Promotes Fiber Formation (2000) 


Pandya et al. (2000) have described the design of a sticky-ended, or 
offset, leucine-zipper-based heterodimer (Fig. 5A). This is unusual be- 
cause all known natural coiled coils are blunt ended, presumably as a 
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consequence of the hydrophobic effect, although Talbot and Hodges 
(1982) have commented on possible (though unintentional) promiscuous 
sticky-ended assembly in early coiled-coil designs. In the design from 
Pandya and colleagues, the free ends of the sticky-ended dimer were made 
complementary to promote longitudinal assembly into extended self- 
assembled protein fibers (SAFs). The features of the design included a 
core that promotes dimerization with a = Ile and d = Leu, residues at 
eand g that promote sticky-end assembly, N-terminal halves of the peptides 
with e= g= Lys, and corresponding residues in the C-terminal halves that 
were Glu. However, the key novel feature of this design is the offset placing 
of complementary Asn residues at a C-terminal a site in one peptide and 
an N-terminal a site in the other. In the absence of a high-resolution 
structure, the design was confirmed using a combination of spectroscopy 
(circular and linear dichroism), electron microscopy, and X-ray fiber 
diffraction (Pandya et al, 2000). The peptides assembled into linear, 
nonbranched fibers tens of microns long and approximately 50 nm thick 
(Fig. 5B). The thickness of the fibers was unexpected—coiled-coil dimers 
are typically +2 nm thick—and reflected assembly of 2 nm protofibrils 
into the matured fibers, which in turn indicated that the protofibrils with 
helically repeated building blocks were inherently sticky. 

The SAF design has been built upon considerably over the past two years 
(Ryadnov and Woolfson, 2003a,b, 2004). Others are pursuing coiled-coil 
designs as a route to self-assembling fibrous structures. Indeed, the first 
example dates back to 1997 (Kojima et al, 1997). More recently, Kajava 
and colleagues have described the design and characterization of self- 
assembling fibers based on a novel pentameric coiled-coil building 
block (Kajava et al., 2004; Melnik et al., 2003; Potekhin et al., 2001), and 
Conticello and colleagues have described fibrous structures based on a 


Fic. 5. (A) Molecular model for the designed SAF sticky ended heterodimer. The 
Lys and Glu residues at the eand gsites are blue and red, respectively; the buried, offset 
Asn residues at a are green. Adapted from Pandya et al. (2000). (B) Negative-stain 
transmission electron micrograph image for fibers assembled from the sticky ended 
building blocks. Adapted from MacPhee and Woolfson (2004). 
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single, 42-residue leucine zipper-based helix (Zimenkov et al., 2004). On a 
related theme, Ghosh and colleagues have reported fibers produced by 
combining basic (EZ) and acidic (KZ) a = d = Leu coiled-coil peptides 
(Zhou et al., 2004). The novelty in this work was that each peptide was 
tethered to a core dendrimer (D) to give constructs D-EZ4 and DER, 
which were then used in assembly. Along with other work—for example, in 
the development of coiled coil-based hydrogel systems (Petka et al., 1998; 
Wang et al., 1999)—these papers represent an exciting beginning for the 
application of coiled-coil design in the area of the rational design and 
exploitation of nanostructured biomaterials. Some of these papers and 
this field have been reviewed recently (MacPhee and Woolfson, 2004; 
Yeates and Padilla, 2002). 


H The Shortest Designed Coiled-Coil Peptide (2000) 


In three recent publications (Burkhard et al., 2000b, 2002; Meier et al., 
2002), Burkhard and colleagues have described the design, redesign, and 
characterization of a two-heptad coiled-coil system, Suc-DELEARIRELE 
ARIK-NHg. This unit was stabilized by hydrophobic interactions (a = Ile, 
d= Leu) and a network of salt bridge interactions. Initial characterization 
of the peptide showed that it was helical and dimeric. However, at 
increased ionic strength, the oligomer state switched to trimer. A structure 
for this form is available. Encouragingly, the introduction of improved 
saltbridge interactions, Suc-EELRRRIEELERRIR-NHg, did further specify 
the dimer state in solution, though the structure deposited in the Protein 
Data Bank for this peptide is still for the trimer. Likewise, removal of salt 
bridges, Suc-DELERAIRELAARIK-NH3, resulted in the loss of coiled-coil 
specificity and the formation of a noncoiled-coil octamer. It is probable 
that the shortness of this design contributes to the change in oligomer 
state and overrides the influence core residues, (a = Ile, d = Leu), which 
would normally specify dimer. In other words the relative stabilities 
of alternative structural states of the unit is probably marginal and, thus, 
small changes in the noncovalent interactions affect the equilibria 
between these states significantly. 


10. Coiled Coils Designed to Switch Conformational 
State (2002 and 2003) 


Coiled-coil motifs have been known to play roles in conformational 
switching in natural proteins for some time (Oas and Endow, 1994). The 
key examples are influenza hemagglutinin (Bullough et al., 1994; Carr and 
Kim, 1993; Carr et al, 1997), and the heat shock transcription factor 
(Rabindran et al., 1993). Furthermore, an engineered form of GCN4-p1, 
with Asn-16 replaced by Ala, switches from dimer to trimer upon addition of 
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cyclohexane or benzene (Gonzalez et al., 1996b). In the presence of ben- 
zene, the mutant peptide crystallizes as a parallel trimer with a single 
benzene molecule bound at a cavity bordered by Leu-12, Ala-16, and Leu-19. 

Ciani et al. (2002) have described the de novo design of Template-a, 
a 29-residue peptide with a canonical dimeric leucine zipper repeat, 
aAALEOK (where a= Val, Lys and Ile). As expected, the peptide folded to 
a helical dimer that unfolded reversibly and in a concentration-dependent 
manner upon heating in solution. The peptide design was also compatible 
with an antiparallel @-hairpin structure. To further enhance /-sheet pro- 
pensity, a mutant peptide, Template-aT (in which the Gln residues at the 
[sites of Template-a were replaced by Thr) was made. Consistent with this 
dual-sequence design principle, thermal unfolding of the mutant resulted 
in the formation of (-structure as judged by CD spectroscopy, and amy- 
loid-like structures as observed by transmission electron microscopy. On a 
similar note, Kammerer et al. (2004) have described a shorter designed 
coiled-coil peptide, cc, which crystallized as a trimer and transformed to 
amyloid-like structures upon heating. These systems present possibilities 
for assessing how a-helical structures switch to @-sheet-based amyloid-like 
structures, and this may shed light on how natural proteins transform to 
amyloid in certain diseases. 

Related to these ideas, Pandya et al. (2004) have described the design of 
an antiparallel coiled-coil (helix-loop-helix) peptide, which is stabilized by 
a disulfide bridge between the termini of the peptide. Reduction of the 
disulfide triggered a switch to a dimeric leucine zipper. 


11. Belt-and-Braces: A Leucine Zipper-Based Nanoscale 
Linker System (2003) 


The design and engineering of systems to control the assembly of 
structures and functional modules with nanometer precision is one goal 
of the burgeoning discipline of nanotechnology. Designed coiled-coil 
systems offer great possibilities in this area for several reasons. First, coiled 
coils can be considered as relatively stiff nanoscale rods. Second, there is a 
precise relationship between peptide length and coiled-coil length, as 
each heptad meters out ~1 nm. Finally, coiled coils fold and self-assemble 
to stable structures at uM concentrations, or lower in water. 

Ryadnov et al. (2003) have designed a coiled coil-based nanoscale linker 
system dubbed Belt-and-Braces. The system was novel in a number of 
respects. Though based on a leucine-zipper dimer design, it was a ternary 
system in which one peptide (the “‘Belt’’) templated the assembly of two 
half-sized peptides (the “‘Braces’’); thus, the system was the first and 
simplest example of a coiled-coil vernier assembly (Kelly et al., 1998). 
The Belt-and-Braces design employed all of the key design rules for a 
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parallel coiled-coil dimer: that is, «= Ile, d = Leu, and the gg pairs are 
Lys:Glu or Glu:Lys. Ternary complexation was directed by the Belt, which 
was a six-heptad acid peptide with the general (g-f) heptad repeat, 
EIAALEQ, that templated the two three-heptad Braces By and Bc, with 
the repeat sequence KIAALKQ. Specificity was achieved by complementary 
Asn residues at the fifth a site of the Belt and the central a site of Bc. 
Finally, the N- and C-termini of By and Bc, respectively, were extended by 
Cys-Gly-Gly- and -Gly-Gly-Cys units. The construct was expected to span 6 to 
7 nm. The folding and assembly of the naked Belt-and-Braces peptides was 
confirmed by a combination of CD spectroscopy, analytical ultracentrifu- 
gation (AUC), and surface plasmon resonance (SPR) using Biacore. 
Moreover, the assembly of bound cargo was demonstrated as follows: the 
Cys-termini of the Brace peptides were labeled with 15 nm colloidal gold 
particles and mixed with the Belt. Transmission electron microscopy 
(TEM) visualization of the resulting assemblies revealed networks of na- 
noparticles separated by approximately 7 nm consistent with the design. 

Stevens et al. (2004) reported a similar leucine-zipper-like linker system. 
This comprised two six-heptad peptides (d = Leu, a = Val, Ala, and Ile), 
one of which was basic with predominantly Lys at e and g, while the other 
was acid with predominantly Glu at eand g. Assembly was confirmed by CD 
spectroscopy and visualized by coupling the peptides to gold nanoparticles 
followed by TEM. A nice touch in this work was the use of different size 
nanoparticles to create binary assemblies including satellite structures, in 
which 8.5 nm particles (derivatized with the acid peptide) were organized 
around 53 nm particles (derivatized with the basic peptide). 

The designs by Hodges and colleagues in the section on heterodimeric 
coiled-coil systems (Chao et al., 1998; Litowski and Hodges, 2001, 2002), 
and by Ghosh et al. (2000) in the next section, provide other examples of 
coiled-coil systems as potential peptide linkers. 


B. Antiparallel Structures 


With the exception of the discussion centered on the crystal structure of 
coil-Ser, which forms a mixed parallel/antiparallel trimer (Lovejoy et al., 
1993), this review has so far ignored antiparallel coiled-coil structures. This 
is for a number of reasons. First, our understanding of antiparallel coiled 
coils is not as advanced as that for the parallel structures (Oakley and 
Hollenbeck, 2001; Walshaw and Woolfson, 2001b). Second, and as a result 
of the first point, fewer design attempts have targeted antiparallel coiled 
coils. Finally, there is a very good recent review on antiparallel coiled-coil 
structures and design (Oakley and Hollenbeck, 2001). Nonetheless, this 
Chapter would not be complete without the inclusion of these design 
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attempts. Beforehand, however, the main differences between antiparallel 
and parallel coiled-coil structures are summarized. Because of the rela- 
tive dearth of experimental data on other antiparallel structures, discus- 
sion is limited to heptad-based antiparallel dimers that, like the parallel 
structures, have left-handed supercoils. 

In order to maintain the register of hydrophobic residues between two 
antiparallel coiled-coil strands, a residues from one strand must pair with 
dresidues of the other (Fig. 1). Thus, whereas in parallel dimers alternating 
core layers comprise a:a’ and dd residues, antiparallel dimers have alter- 
nating a:d and d:d layers. On this basis, the stereochemical packing of 
mixed a:d layers might be expected to differ from those in the symmetric 
layers of parallel structures. However, analysis of the known antiparallel 
dimer structures show that packing geometries of the a and d side chains 
can still be considered as roughly parallel and perpendicular, respectively 
(Walshaw and Woolfson, 2001b), although both distributions of angles 
are shifted slightly towards acute (trimer) packing. This is reflected in the 
amino-acid profile for antiparallel dimers (Walshaw and Woolfson, 2001b) 
(Table I). Briefly, this profile is a somewhat dampened version of the parallel 
dimer profile, with a preference for Leu at dand Leu /Ile at a. The interchain 
interactions between the core-flanking e and g residues are, however, very 
different: in parallel structures g,, and e) interactions are made across the 
interface, whereas in antiparallel structures it is g:g’ and e:e'. This provides 
the protein designer with a negative-design rule and a route to designing 
antiparallel coiled coils, avoiding alternative parallel structures. 


1. The First Hodges Construct (1993) 


As with the first design for a parallel coiled coil, Hodges and colleagues 
were also the first to describe a construct for an antiparallel coiled coil 
(Monera et al., 1993). This was based on the (g-f) heptad repeat KLEA- 
LEG (Hodges et al., 1981). To construct the antiparallel system, several 
changes were made to the design. First, Leu-16, an a site, was replaced by 
Ala. Second, two peptides were used, incorporating Cys at the first a site 
(peptide C2A16) and the last d site (peptide C33A16). Both peptides were 
five heptads long. The two peptides were mixed under denaturing condi- 
tions and air-oxidized to give a 1:1:2 mixture of the two forced-parallel 
homodimers and the forced-antiparallel heterodimer. Note that this con- 
struct did not consider electrostatic pairings between the e and g sites: the 
g:g’ and e:e’ pairs in the antiparallel construct are all repulsive, whereas 
the gel pairs in the parallel forms are attractive. Characterization of the 
forced-parallel and antiparallel constructs reflected this. With respect to 
heat and urea-induced denaturation, the forced-antiparallel dimer was less 
stable than either of the covalent parallel dimers; although the relative 
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stabilities with respect to guanidinium hydrochloride-induced denatur- 
ation were reversed, this is explained by a salt effect. Furthermore, when 
the covalent dimers were made under benign (folding) conditions, only 
the parallel dimers were formed; equilibration of the forced-antiparallel 
dimer with disulfide-exchange reagents returned the parallel dimers. 
Nonetheless, the antiparallel construct made was helical, cooperatively 
folded, and represented the first engineered antiparallel coiled coil. 
Furthermore, in the same paper, Hodges and colleagues recognized 
the error in their design and described a final construct in which 
charged pairs were optimized to favor the antiparallel heterodimer. They 
used these constructs in several subsequent papers to probe sequence- 
to-structure/stability relationships in parallel and antiparallel coiled coils 
(Monera et al., 1994, 1996a,b). 


2. CCSL: A Coiled-Coil Stem Loop Structure of de novo Design (1994) 


Myszka and Chaiken (1994) described the rational design of a single- 
chain antiparallel coiled-coil structure. This design was very well concei- 
ved. The peptide was 56-residues long and it had two 25-residue coiled-coil 
sequences separated by a flexible and functional (RGD-containing) six- 
residue loop. Except for the first a:d’ layer (which is Cys:Cys), the d:d and 
a:d layers were Leu:Leu and Val:Val, respectively. Furthermore, the eand g 
sites in the N-terminal half were all Glu, whereas those in the C-terminal 
half were Lys to facilitate favorable g:g’ and ee’ interactions. The unit was 
stapled by a disulfide bond between Cys-1 and Cys-56 at a and d sites, 
respectively, and the remaining 8, c, and fsites were Ala and Ser. CCSL was 
soluble, monomeric, approximately 80% helical, and unfolded coopera- 
tively. Finally, the peptide was active, competing with fibrinogen for the 
GPIIbIlla receptor. 

Based on very similar principles, though more recently, Suzuki and Fujii 
(1999) designed a helix-loop-helix peptide. On a related theme, the 
aforementioned design from Pandya et al. (2004) for a helix-loop-helix 
peptide stabilized by a disulfide bridge used similar ideas, although the 
final sequence was very different from the Myszka and Chaiken and the 
Suzuki and Fujii peptides, as it was also made compatible with a parallel 
coiled-coil dimer to promote conformational switching. 


3. An Antiparallel Variant of Peptide Velcro (1998) 


Oakley and Kim (1998) have described a simple and elegant experiment 
to switch the aforementioned Peptide Velcro from a parallel heterodimer 
to an antiparallel structure. The strategy was straightforward and resulted 
in a powerful design rule for specifying antiparallel structures. The Acid-pl 
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and Base-pl sequences of Peptide Velcro were redesigned so as to mis- 
match the Asn residues at their central a sites; in Acid-al Asn is placed at 
the third a site, while in Base-al it is at the second d site. In the targeted 
antiparallel structure, these sites form an a:d’ layer. The solution phase 
characterization indicated that the favored complex was the antiparallel 
heterodimer. This study added to the rules for coiled-coil assembly and 
specification because an Asn:Asn pair at a:d was unprecedented in natural 
coiled-coil structures (Oakley and Hollenbeck, 2001). Oakley and collea- 
gues have extended this work to improve the designs of Acid-al and 
Base-al, using both positive and negative design principles to improve the 
g:g’ interactions to favor the antiparallel orientation over the competing 
parallel arrangement (McClain et al., 2001). 


4. Antiparallel Leucine-Zipper-Directed Protein Fragment 
Complementation (2000) 


A number of proteins are known that can be split into two frag- 
ments, which when recombined form functional, though usually less 
stable, folded structures (Michnick, 2001). This is known as fragment 
complementation. For proteins that cannot be split and reconstituted so 
straightforwardly, there is the possibility that reassembly might be directed 
by additional reagents such as ligands. Such systems present a potential 
route to new biosensors. 

Ghosh et al. (2000) have described an elegant system in which two 
fragments of the green fluorescent protein (GFP) from Aequorea victoria 
could be recombined only in the presence of helical tags engineered at 
the new termini created by the split. The helices were two halves, NZ and 
CZ, of a designed antiparallel coiled-coil dimer. The peptides were 29 and 
30 residues long respectively. The cores were all Leu, except the second d 
site of NZ and the third a of CZ, which were Asn. All of the potential e:e’ 
and g:g’ pairs were Lys:Glu, and the remaining b, c, and f sites were 
combinations of Ala, Gln, and Lys with a single Trp chromophore at an 
[site in each peptide. Reconstitution of GFP was demonstrated both in vitro 
using the purified protein fragments, and in vivo by co-expression of the 
two components in E. coli. 


5. APH: A Designed Antiparallel Homodimer (2003) 


The designs outlined above are effective all for heterodimers. The 
design of a homodimeric antiparallel coiled-coil is more challenging. 
For instance, specification via the burial of two Asn residues, one at a 
and the other at the corresponding d site, is not possible. In a thoughtful 
design, Oakley and colleagues tackled this problem to generate 
APH (Gurnon et al., 2003). APH is a 45-residue recombinant peptide 
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overexpressed in E. coli. The design had the following key features: al- 
though the core was predominantly Leu, there were two Ala:lle pairs and 
two Leu:Arg pairs at od layers. The argument for using the former pair 
was that the small Ala side chain helps accommodate Ile in the antiparallel 
form. The inclusion of Arg within the core followed work by the groups of 
Oakley and of Lumb indicating that this provided specificity at a smaller 
energetic price than buried Asn residues (Campbell et al., 2002; McClain 
et al., 2002). On the basis of amino-acid profiles (Walshaw and Woolfson, 
2001b; Woolfson and Alber, 1995) and experimental studies (Gonzalez et al., 
1996c), these inclusions probably also destabilize potential parallel dimers 
and trimers, respectively. Finally, to favor the antiparallel form over alterna- 
tive parallel structures, all potential e:e’ and eg interactions were made into 
attractive Glu:Lys pairs, whereas potential g,:e,+1' interactions were made 
repulsive. The design held up to scrutiny by solution-phase biophysics. 


IV. SUMMARY 


This review has attempted to build on the preceding Chapters that 
convey the richness of coiled-coil sequences and structures. With this 
backdrop of many potential coiled-coil structures and very many potential 
coiled-coil sequences, one aim of this Chapter has been to illustrate that 
sequence-to-structure relationships in coiled coils can be understood 
sufficiently to attempt with some confidence and, hence, with some degree 
of success, the rational design of coiled-coil structures. At present, this 
process is largely done on the backs of envelopes guided by good rules of 
thumb, intuition, and imagination although, and as noted, computer- 
aided designs are increasingly being reported. Nonetheless, the back-of- 
the-envelope approach has had many successes from designs for basic, 
naturally observed structures to more-imaginative ones that might be used 
as probes, tethers, molecule rulers, affinity matrices, and components of 
new materials. In these respects, coiled-coil design is arguably the most 
successful of all protein-design areas. As well as continuing such efforts, 
one challenge in the near future will be to formalize the design rules by 
linking them to thermodynamic measurements and employing them fur- 
ther in prediction and computer-aided design algorithms. The future of 
coiled-coil design is very exciting indeed. 
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ABSTRACT 


A large number of intermediate filament (IF) chains have now been 
sequenced. From these data, it has been possible to deduce the main 
elements of the secondary structure, especially those lying within the 
central rod domain of the molecule. These conclusions, allied to results 
obtained from crosslinking studies, have shown that at least four unique 
but related structures are adopted by the class of structures known generi- 
cally as intermediate filaments: (1) epidermal and reduced trichocyte 
keratin; (2) oxidized trichocyte keratin; (3) desmin, vimentin, neurofila- 
ments, and related Type III and IV proteins; and (4) lamin molecules. It 
would be expected that local differences in sequences of the proteins in 
these four groups would occur, and that this would ultimately relate to 
assembly. Site-directed mutagenesis and theoretical methods have now 
made it possible to investigate these ideas further. In particular, new data 
have been obtained that allow the role played by some individual amino 
acids or a short stretch of sequence to be determined. Among the observa- 
tions catalogued here are the key residues involved in intra- and interchain 
ionic interactions, as well as those involved in stabilizing some modes of 
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molecular aggregation; the structure and role of subdomains in the head 
and tail domains; the repeat sequences occurring along the length of the 
chain and their structural significance; trigger motifs in coiled-coil seg- 
ments; and helix initiation and termination motifs that terminate the rod 
domain. Much more remains to be done, not least of which is gaining an 
increased understanding of the many subtle differences that exist between 
different IF chains at the sequence level. 


I. INTRODUCTION 


Over the past 25 years, a wealth of amino acid sequence data have been 
derived for the class of proteins known generically as intermediate filaments 
(IFs). These include: the Type I and Type II chains characteristic of 
trichocyte and epidermal keratins; the Type III chains of desmin, vimen- 
tin, glial fibrillary acidic protein, and peripherin; the Type IV chains in 
neurofilaments and of a-internexin; the Type V chains of nuclear lamins; 
and, finally, the Type VI chains of nestin (and possibly synemin and 
paranemin). There is some debate on the classification of nestin as a Type 
VI IF protein, and Shaw (1998) has suggested that it should be designated 
within the Type IV group on the basis of gene structure. Other IF proteins 
include phakinin and filensin of beaded filaments in lens fiber cells. While 
all of these proteins are intracellular ones (though occurring at both 
cytoplasmic and nuclear sites), there is at least one other group that occurs 
extracellularly: these proteins are produced in the gland thread cells of 
the hagfish (Downing et al., 1981; Koch et al., 1991, 1994, 1995). Further- 
more, intermediate filaments also occur widely in invertebrates (see, for 
example, Riemer et al., 1998). 

Some IF protein chains are able to homopolymerize in vivo to form 
structurally viable and functional IFs. These include desmin, vimentin, 
peripherin, glial fibrillary acidic protein, NF-L, nestin, and phakinin. 
Some other chains only form IFs if they have an appropriate copolymeri- 
zation partner. These include the keratins (which form Type I/Type II 
molecules: Hatzfeld and Weber, 1990; Herrling and Sparrow, 1991; Parry 
et al., 1985; Steinert, 1990), nestin (which copolymerizes with vimentin and 
a-internexin: Steinert et al., 1999a,b), paranemin and synemin (with des- 
min or vimentin: Bilak et al., 1998), filensin (with phakinin: Goulielmos 
et al., 1996), and both the medium (NF-M) and heavy (NF- H) neurofila- 
ment chains (with NF-L: Ching and Liem, 1993; Lee et al., 1993). Some IF 
proteins are able to form both homo- and heteropolymers depending on 
circumstances: desmin, vimentin, peripherin, glial fibrillary acidic protein, 
a-internexin, nestin, phakinin, and NF-L (Herrmann and Acht, 2000; 
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Parry and Steinert, 1999; Steinert et al., 1999a). There is increasing evi- 
dence that the copolymerized molecules may be located preferentially on 
the surface of IFs and that the core comprises predominantly homodimers 
of a more abundant chain type (Goulielmos et al., 1996; Herrmann and 
Aebi, 2000). 

The primary structure data have provided a strong basis for both 
theoreticians and experimentalists to gain an increased understanding 
of the conformation of an individual IF chain, its resulting molecular 
structure, and its mode of assembly to form viable IFs. All of these IF 
chains are homologous to one another to varying degrees over a central 
region, and distinct subdomains have been described. In particular, there 
are four regions with a heptad substructure that are predicted to form 
coiled coils; these are designated 1A, 1B, 2A, and 2B. They are separated 
by short noncoiled coil regions lacking a heptad repeat, known as linkers 
(L1, L12, and L2 respectively; Fig. la). In contrast, the end domains of 


Tails 


(b) 


(c) 


Fic. 1. Schematic diagram of (a) an intermediate filament heterodimer with coiled- 
coil domains 1A, 1B, 2A, and 2B, and noncoiled-coil connecting linkers L1, L12, and 
L2. A stutter occurs in the heptad substructure at a point close to the center of segment 
2B. The N-terminal “globular’’ domains (green for Type I and brown for Type I 
chains) are termed the heads, and the C-terminal domains (red for Type I and orange 
for Type II chains) are designated the tails. In (b), the heads are shown folded back 
over the rod domain, where it is believed that this will stabilize segment 1A. In (c), the 
heads are shown away from the body of the rod domain and in a position where they 
can interact more easily with other cellular entities. As a consequence, segment 1A may 
become destabilized and hence unwind to form two separate a-helical strands. 
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different chain types display considerable variability in both size 
and chemical character. In turn, the molecule contains two parallel chains 
lying in axial register; these form a central rodlike domain with a coiled- 
coil structure. The rod separates the ‘‘globular’’ head and tail domains, 
and in so doing forms a tripartite molecule. In simple terms, the rod 
domains dominate (though not absolutely) in the assembly of mole- 
cules to form the core of the filaments. A significant fraction of the end 
domains is thus disposed in outer positions on the filament surface, where 
they can interact optimally either with one another or with other cellular 
entities for functional and/or structural reasons. A number of reviews are 
available regarding IF structure, assembly, and function (see, for example, 
Herrmann et al, 2003; Parry, 1997; Parry and Steinert 1995, 1999; 
Hermann and Aebi, 1999a). These will provide the reader with additional 
and useful background material to the subject matter that is the theme of 
this Chapter. 

The purpose of this review is to summarize details of both the specific 
structures and roles which individual amino acids, short stretches of 
amino acids, and subdomains play in vivo. In some cases, the pertinent 
sequence identified as significant on the basis (perhaps) of high sequence 
conservation or regularity remains largely uninterpretable in detail, 
though often strong indications may exist as to its likely structure/func- 
tion. In other instances, there is a high degree of very specific information 
at hand; this will be chronicled here individually for each residue, se- 
quence, or segment of structure, as the case may be. It is believed that this 
compendium is the first of its type and that it will, hopefully, provide a 
useful insight to those in the field as to the gaps that exist in our present 
knowledge. 


II. SEQUENCE REGULARITIES AND CHARACTERISTICS, AND THEIR 
EFFECTS ON SECONDARY AND TERTIARY STRUCTURE 


A. Head Domain 


To facilitate easy access to information, this section has been subdivided 
(in most cases, purely on the basis of chain type). In a number of 
instances, however, published data refer generically to both head and tail 
domains; in order not to repeat the information in Sections II.A and II.I, a 
separate section (Section II.J) has been written to cover these features. 
They include: (1) the covalent binding sites in head and tail domains that 
are involved in crosslinks with other proteins in the cell; and (2) post- 
translational modifications and their structural/functional effects in vivo. 
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1. Epidermal Keratins (Type I/Type II) 


The head domain in all epidermal keratins displays a highly developed 
substructure (Fig. 2a: Steinert et al, 1985). The region immediately 
N-terminal to the beginning of the rod domain is designated the 
Homologous 1 (Hl) subdomain and is characterized by a well-conserved 
sequence of seven residues in Type I chains and 36 residues in Type I 
chai nsEquivalent regions are found in the trichocyte keratins (Section 
II.A.2). The consensus sequences are given in Parry and Steinert (1999). 
While there is little knowledge of the precise structure of the Hl sub- 
domain, it has been shown experimentally that in Type II chains its 
functional role is to specify molecular aggregation at the two- to four- 
molecule level (Hatzfeld and Burba, 1994; Steinert and Parry, 1993; 
Steinert et al., 1993a). N-terminal to Hl is subdomain Variable 1 (V1), 
which has a variable length and sequence but is nonetheless characterized 
by a high content of glycine and serine residues. V1 has a length in the 
range 65 to 140 residues in Type I chains and 20 to 160 residues in Type II 
chains. Finally, the most N-terminal region is called the End (E) subdo- 
main, and this is often relatively basic in character. It is short in Type I 
chains (0-10 residues) but generally longer in Type II chains (5-70 
residues). Little is known about the overall conformation of head domains 
in epidermal keratins, though a good working model displaying a glycine 
loop structure for subdomain V1 is available (Korge et al., 1992a, 1992b; 
Steinert et al., 1991). In essence, runs of glycine-rich sequence are an- 
chored by interactions between aromatic and/or large apolar residues 
(Fig. 2b). On theoretical (Conway et al., 1989) and experimental grounds 
(Mack et al., 1988), such a conformation has been shown to be very 
flexible. The glycine loop model also allows deletions and insertions to 
occur, as has been characterized in detail for the V2 subdomain in human 
K10 (see Section II.I). The structure also permits variations in glycine loop 
size to occur without impinging adversely on the remaining structure. 


2. Trichocyte Keratins (Type I/Type I) 


The head domain in trichocyte (or hair) keratins has been character- 
ized by Parry and North (1998) and Parry et al. (2002) and shown to 
consist of two domains—a basic one (NB) at the N-terminal end of the 
head domain (27 and 70 residues long, respectively, for Type I and Type II 
chains) and an acidic one (NA) lying between the rod and NB. The latter 
is 29 and 34 residues long for Type I and Type II chains, respectively. 
Importantly, however, NA is homologous to Hl in the respective chain 
types of epidermal keratins. There can therefore be little, if any, doubt 
that the role of these regions in trichocyte and epidermal keratins is 
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(b) 


Fic. 2. (a) Schematic diagram of the subdomain structure in epidermal keratin 
chains. Homologous (H), variable glycine-serine-rich (V) and generally basic (E) end 
domains are present and these have bilateral symmetry about the central rod domain. 
The H subdomains have a special role in stabilizing oligomers (about 2-4 molecules) 
during the early stages of assembly to form filaments. The numbers of residues in each 
subdomain are indicated below for both the Type I and Type II chains. (b) The 
sequences of the V2 subdomain in the human K10 chain show marked differences in 
allelic variants that have been characterized. Deletions and insertions are seen in the 
loop structure anchored and stabilized by interactions between the aromatic residues 
tyrosine and phenylalanine. Glycine and serine residues dominate the loops, which are 
believed to be highly flexible structures. Redrawn from Parry and Steinert (1995) and 
based on an original by Korge et al. (1992a). 
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identical (i.e., they stabilize assembly at the two- to four-molecule level). 
NA and NB both have clear-cut amino acid composition characteristics, 
which differ considerably between chain types. 

The Type I head domain structure is not easily defined. Starting from 
the N-terminal end of the chain, residues 1 through 11 lack charged 
amino acids. Residues 12 through 18 are partially conserved and contain 
both serine and arginine in greater amounts than expected on average. 
Residues 19 through 47 lack charged amino acids but cannot be usefully 
subdivided further. After a single glutamic acid (residue 48), there is a 
region (residues 49-55) directly homologous to Hl in the Type I epider- 
mal chain (Fig. 3a). Overall, residues 1-11 and 19-47 have an underlying 
nonapeptide repeat of the form C/F-F/N-X-F/P-C-L-P-G-S. There 
are no indications at present as to the conformation adopted by this 
quasi-motif. 

In the Type II head domains, the sequences display little homology 
over the first 12 residues but there is considerable consensus over the next 
24 (residues 13-36). This sequence is R-A-F-S-C-V-S-A-C-G-P-R-P- 


(a) Type | 
uncharged 
NN | [H1] rod domain 7 omg 
1 11121819 47 49 55 Z 
polyglycine II helix 
(res. 22-42) 
(b) Type Il 76 109 
4-stranded 
antiparallel 
4-stranded B-sheet 
antiparallel (res. 2-55) 
P-sheet with 
N nonapeptide 
quasi-repeat Cc 
(res. 37-72) 56+ variable length 


conserved sequence 


Fic. 3. Schematic diagram of (a) Type I and (b) Type II trichocyte keratin chains 
illustrating zones of particular character in the sequences and the proposed secondary 
structure in both the head and tail domains. The head domain in both the Type I and 
Type II chains contains a conserved Hl subdomain believed to be important in 
stabilizing the structure at the 2- to 4molecule level of assembly. The Type I chain also 
contains two segments lacking charged residues that enclose a short region rich in 
serine and arginine residues. The Type II chain contains a four-stranded antiparallel 
(Gsheet with an underlying but poorly conserved nonapeptide repeat, as well as 
24-residue stretch of sequence displaying high conservation. Its role is unknown. The 
tail domains are predicted to contain a poly-glycine II helix and a four-stranded 
antiparallel 3-sheet respectively in the Type I and Type II chains. 
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G-R-C-C-I-T/S-A-A-P-Y-R. C-terminal to this region, there are four 
contiguous but imperfect nonapeptide repeats (residues 37-72) of the 
form (G-G-F-G-Y-R-S-X-G) 4. Note that this repeat is different from the 
one described in the Type I head domain. This region is predicted to 
constitute a four-stranded antiparallel G-sheet that folds back over segment 
1A and interacts with it under some circumstances, or which interacts with 
an equivalent -sheet in another molecule in a different IF in other 
circumstances (Parry et al, 2002). Evidence in support of part of 
this premise is the fact that disulphide crosslinks have been induced in 
mouse hair between cys 99 in the Type II head domain and cys 6 in 
segment 1A of the Type II chain, and between cys 75 in the Type H head 
domain and cys 30 in segment 1A of the Type I chain (Parry et al., 2002). 
Following a V-G-G tripeptide, the NA domain (residues 76-109) is com- 
pleted with a consensus sequence of P-S-P-P-C-I-T-T-V-S-V-N-E-S-L- 
L-T-P-L-N-L-E-I-D-P-N-A-Q-C-V-K-Q /H-E-E. As noted before, 
this is homologous to H1 in the epidermal type II chain (Fig. 3b). 


3. Desmin, Vimentin, Glial Fibrillary Acidic Protein, and Peripherin (Type II) 
and Neurofilaments and a-Internexin (Type IV) 


From a wide range of species, the head domain in vimentin, desmin, 
peripherin, and Type IV IF protein contains a conserved nonapeptide of the 
form S—S-Y-R-R-T-F-G-G. This is required for regular assembly to pro- 
ceed (Herrmann ei al., 1992). The nonapeptide sequence is missing, howev- 
er, in glial fibrillary protein but two other motifs have been identified 
(Ralton et al., 1994). Herrmann and Aebi (1998b) have pointed out that, 
in addition to the conserved nonapeptide, there are six aromatic residues 
roughly equally spaced in the head domain, as well as 12 arginine residues. 
Overall, the head domain has a strongly hydrophilic character. A possible 
conformation would be one akin to that proposed for the glycine-serine-rich 
V domains in epidermal keratin. In that structure, and possibly here also for 
Type II head domains, a “‘glycine loop” type of structure may exist with 
loops being anchored through strong aromatic interactions (Fig. 2b). 


B. Segment 1A 


Segment IA is defined in large part by its underlying heptad repeat, 
which has the form (a-b-c-d-e-fg)„, where positions a and d are largely 
occupied by apolar residues. Such a motif is characteristic of an a-helical 
conformation that aggregates with others to form a multistranded coiled- 
coil rope. In the case of IF molecules, there are two strands only, which are 
aligned parallel to one another and in axial register (see, for example, 
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Parry et al, 1985 and the crosslinking data of Steinert et al., 1993a,b,c, 
1999a). 

The length of segment 1A is invariant (35 residues) in all IF chains. It is 
characterized by a number of unusual features and is clearly differentiated 
from other coiled-coil segments (1B, 2A, and 2B) in IF proteins. First, 
segment 1A contains more highly conserved residues across all IF chain 
types than any other coiled-coil segment in the rod domain: L7, N8, A12, 
V18, L21, E22, and N25 (Smith et al., 2002). This points to particularly 
important roles for segment 1A, some of which are described below. (One 
particular role—that of a head-to-tail overlap between similarly directed 
molecules—is discussed in Section I.H). Second, towards the N-terminal 
end of 1A, there is a very highly conserved stretch of sequence known 
as the helix initiation motif that is rich in apolar and aromatic residues. 
It encompasses residues 1 through 22 and has a consensus sequence 
of K/R-X-T/Q-M/VL-K/Q-X-L-N-D-R-F-A-S-F/Y-I-D/E-K-V-R- 
F-L-E (Parry and Steinert, 1999; Smith et al, 2002). Third, at the 
C-terminal end of this region, there is a potential interchain ionic interac- 
tion that is conserved across all chain types (Smith et al., 2002). This occurs 
between a basic residue in the g position (residue 17) and an acidic 
residue in the e position (residue 22). Fourth, two of the three regions 
in the rod domain displaying the greatest hydropathy are located in 
segment 1A (residues 12-21 and 27-33). The third region lies at the 
end of segment 2B (Section II. Fifth, there is a single potential intra- 
chain ionic interaction that is conserved across all chain types. It is of the 
type ito 7 + 3 and occurs between an acidic residue in position 16 and an 
arginine residue in position 19 (Smith et al., 2002). 

Next, although the heptad repeat is a well-conserved feature of segment 
1A, there are unique features present that are absent in other coiled-coils 
in a-fibrous proteins, including the others in IF proteins. This is mani- 
fested in an unusually high apolar residue content in positions a and d, 
especially the latter (76% in a and 92% in d compared to average values of 
73% and 76%, respectively). Also, the leucine residue content in position 
d (68%) is very much greater than normal (44%). The most significant 
difference, however, lies in the distribution of the amino acids in each of 
the heptad positions as compared to both an average two-stranded coiled 
coil and to segments 1B and 2B in particular (Smith et al., 2002). These 
were listed by the authors as follows: in position b, K increases from 2.5% 
to 14.6%, R increases from 6.8% to 23.0%, and E decreases from 19.6% 
to 3.6%; in position c, R decreases from 10.4% to 2.9%; in position e, 
E increases from 22.1% to 31.8 %, and R decreases from 16.7% to 3.8%; in 
position f, E increases from 9.2% to 20.7%, D increases from 7.1% to 
26.8%, and R decreases from 11.5% to 2.0%; in position g, E decreases 


122 PARRY 


from 27.7% to 16.4%, R increases from 6.1% to 21.8%, and K increases 
from 8.4% to 28.0%. Thus, positions c, e, and f become more acidic/less 
basic and positions b and g become more basic/less acidic compared to 
two-stranded coiled-coils in general. 

In the seventh difference, segment 1A lacks any significant periodicity in 
the linear distribution of its acidic or its basic residues. This is in direct 
contrast to these same residue groupings in both segment 1B and segment 
2 (2A-L2-2B). In these latter cases, stabilizing ionic interactions arising 
from the regular disposition of oppositely charged residues are believed to 
be important in the molecular aggregation process. Segment lA, in 
contrast, lacks such interactions. Finally, segment 1A represents one of 
the two mutation hotspots in IF proteins (Parry and Steinert, 1995, 1999). 
This again suggests an important structural/functional role for this 
region. 

In order to study the residues likely to be important in stabilizing various 
modes of molecular aggregation, point mutations of charged residues 
were engineered along the length of the K5/K14 epidermal keratin 
chains. Aggregation characteristics of equimolar mixtures of wild-type 
and mutant chains (Mehrani ef al., 2001) were then studied. The data 
show that the conserved residue R10 in segment 1A was essential for 
stability and that D/N9 was also important in this regard. These two 
residues lay in close axial proximity in the Aj; mode to conserved residues 
4E and 6E in linker L2 that were shown to be essential for stability, and to 
residue D7 in linker L2 that also has an important role in providing 
stability. Likewise, the results have shown that the conserved residue D1 
in segment 2A was important for stability, as were D/N3 (also in segment 
2A) and conserved residue Käl in segment 1A. These two groups of 
residues (1A-9/10 with L2-4/6/7 and also 1A-31 with 2A-1/3) lay in close 
axial proximity in the Aj; mode, and it was suggested that they are 
involved in making stabilizing intermolecular ionic interactions and 
hydrogen bonds (Mehrani et al., 2001). 

Twenty-two naturally occurring mutations observed in segment LA of 
patients suffering from epidermolysis bullosa simplex (EBS) have been 
studied using a molecular dynamics approach (Smith et al., 2004). This was 
a comprehensive and computationally exhaustive study that yielded a 
number of interesting results. For example, the conformational changes 
observed fell into five categories. In regards to the backbone structure, 
the mutation caused: (1) a local structural change only in the chain 
in which the mutation was located; (2) a structural change only in the 
partner chain, usually to the extent of one or two turns of a-helix; 
(3) local structural changes in both chains, but again only locally; (4) 
widespread conformational change; and (5) little obvious conformational 
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modification. Furthermore, the effect of mutations in the inner a/d, 
adjacent e/g, and outer positions b/c/fof the coiled coil were (surprisingly) 
directly comparable. The same mutation in different chains of the hetero- 
dimer did not necessarily give rise to the same structural change. Different 
mutations at the same site Leg, 1A-10) can give rise to quite different 
structural changes, indicating that it is currently impossible to predict 
from first principles the extent of a structural change introduced by a 
specific mutation. 

Unexpectedly, crystallographic studies of segment LA revealed that the 
structure was not that of a two-stranded coiled coil but was instead 
characterized by separate a-helical strands (Strelkov et al., 2002). Each 
individual a-helix had a coiled axis (radius of curvature, 8.4 nm) and it 
was easily shown that two such strands could be assembled computational- 
ly to form a coiled-coil molecule (radius, 0.5 nm; pitch length, 16.5 nm) 
that was virtually identical to that of GCN4, an archetypical coiled-coil 
structure (Smith ei al., 2002). The idea that segment 1A could form a 
coiled coil under some circumstances, but two individual a-helices under 
other conditions, was an exciting one that intimated a more dynamic role 
for the N-terminal end ofthe rod domain than had hitherto been assumed 
(Herrmann et al., 2003; Parry et al., 2002). The observation by Strelkov et al. 
(2001) that dimers formed only in the presence of the head domain can 
be interpreted in terms of segment 1A being able to unwind. Furthermore, 
it seems probable that the head domain stabilizes the dimeric form of 
segment LA by folding back over it (Fig. 1b and c). Smith et al. (2002) 
calculated the stability of the a-helices in all coiled-coil segments in the 
rod domain of IF proteins, and were able to demonstrate that the a-helices 
in segment LA were the most stable. This is consistent with them being 
able to exist as individual a-helices when segment 1A unwinds into its 
component strands. 


C. Linker LI 


Linker L1 is a short stretch of sequence that connects coiled-coil seg- 
ments 1A and 1B. In contrast to these segments, L1 lacks both their 
heptad repeat and their a-helical structure, the only exception being that 
of the Type V lamins. The length of L1 is typically 9-16 residues for Type I 
chains, 10-14 residues for Type II, 8-11 residues for Type II, 9-10 residues 
for Type IV, and 9 residues for Type VI. The special case of lamin Type V 
chains is discussed below. In general, the L1 sequences from different 
chain types show very little similarity to one another over their central 
regions. However, limited homology does exist at both ends of L1. All L1 
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linkers have a high charged/apolar residue ratio (typically about 1.6), 
characteristic of an elongate conformation. Indeed, Parry et al. (2001) 
have shown that linker L1 is the most elongate rod domain element in 
terms of the average unit rise per residue. Crosslinking data show that the 
axially projected length of L1 is 14.84 hec (2.20 nm) in Type I and Type II 
keratins, 11.47 he (1.70 nm) in Type III vimentin (Parry et al., 2001), and 
14.50 h.. (2.15 nm) in Type IV a-internexin (Parry and Steinert, 1999), 
where h.. is the unit rise per residue in a coiled-coil conformation 
(0.1485 nm). Within the experimental limitations, these values probably 
do not differ significantly from one another. 

Unlike other IF chains, linker L1 in the lamins is predicted to be 
a-helical. Furthermore, the heptad repeat in segment 1A can be extended 
C-terminally, and then in segment 1B N-terminally, to encompass the 
entire region normally defined as L1. The heptad repeat does suffer a 
phase change at the region normally termed L1, and this corresponds to 
an insertion of one residue or a deletion of six residues in an otherwise 
continuous heptad repeat. This type of heptad interruption is called a skip 
and is structurally equivalent to a pair of closely spaced stutters (three 
residue deletions). This results in a local unwinding of the coiled coil so 
that the strands probably coil around one another locally in a right- 
handed manner (Brown et al., 1996; Burkhard et al., 2001; Kühnel et al., 
2004; Ozbek et al., 2004; Strelkov and Burkhard, 2002; Stetefeld et al., 
2000). Thus, in lamins, segment 1 (1A + L1 + IB) effectively adopts a 
continuous coiled-coil structure over its entire length, and in this respect 
Type V lamins differ significantly from all other IF chain types. 

Excluding the lamins, linker L1 is the most flexible rod domain region. 
This was initially shown by Conway et al. (1989) and more recently 
by Smith ef al. (2002), who confirmed that L1 did indeed have the 
highest flexibility index of any rod domain element in chain types I-IV. 
It would appear that flexibility for linker L1 is an important structural 
feature of IF molecules. Further proof of this point comes from the 
elegant work of Herrmann and Aebi (1998b) and Herrmann et al. 
(1999). A variety of mutants of L1 were prepared, including one in which 
two additional a-helix-favoring alanine residues were inserted and one 
residue was mutated to an alanine. This maintained the heptad phasing 
from segment 1A through to segment 1B and gave the sequence a greater 
probability of forming an a-helical conformation. Assembly of this mutant, 
however, was adversely affected and only short and rather irregular 
aggregates were formed. Also, a mutant of L1 in which one residue 
was changed to a non-a-helix-favoring proline residue and two addi- 
tional prolines were added gave rise to normal filament formation. It 
therefore is important for L1 not to form a regular a-helical coiled-coil 
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structure, and some flexibility is clearly crucial for correct assembly and 
filament formation. 

A mutation in L1 of particular interest has been reported by Mücke et al. 
(2004). In this case, the first residue in the linker in human vimentin has 
been altered from a lysine to a cysteine residue. At 37 °C, the KIC mutated 
protein assembles normally into unit-length filaments (ULFs) within about 
10 seconds and into normal IFs after about an hour. When assembly is 
carried out at room temperature, ULFs are formed in the normal manner. 
However, elongation and assembly of the ULFs to form intact IFs was 
impeded significantly. Thus, the KIC mutation has a significant heat- 
sensitive effect, not on the formation of the important ULF intermediary, 
but on its ability to aggregate axially with others to form viable IFs. The 
rationale for this observation is not yet known, but several explanations 
exist. The lysine residue in the wild-type protein might have been involved 
in a salt bridge and the cysteine might be able to form a disulphide bond. 
It also needs to be remembered that apolar interactions are stronger at 
high temperatures. 

In many respects, the linker L1 can be likened to a flexible hinge, a role 
consistent with a swinging head model proposed for IFs. In this model, 
segment LA can (under appropriate conditions) split into two separate 
a-helical strands, thereby maximizing the range of movement of the head 
domains and hence increasing their capacity to interact with various other 
cellular entities (Parry et al., 2002). 


D. Segment 1B 


Segment 1B has an uninterrupted heptad substructure (101 residues 
long) in all IFs except for Type V lamin and some invertebrate IF chains 
(Fig. 4). In these latter cases, segment 1B contains a 42-residue (six 
heptad) insertion located about 40 residues from its N-terminus (Fisher 
et al., 1986; McKeon et al., 1986; Parry et al., 1986). When a comparison is 
made of all IF sequences, it is apparent that three residues are virtually 
conserved across the entire spectrum indicating a special role, probably 
structural rather than functional. These residues are L74, L81, and 
E95 (Smith ef al., 2002). In addition, there is a strong possibility of a 
conserved interchain ionic interaction in all IF molecules between a con- 
served acidic residue in a g position of one chain (D/E84) and a 
conserved basic residue in an e position of a second chain (K/R89) (Smith 
et al., 2002). In precisely this same region (and almost certainly not by 
chance), a potential trigger motif has been recognized for epidermal 
keratin K5/K14 defined by residues 79-91 (Wu et al., 2000). This has a 
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Fic. 4. Schematic diagram of segment 1B in intermediate filament chains showing 
generally conserved features. These include the highly conserved residues L74, L81, 
and E95; the conserved intra- and interchain ionic interactions represented by solid and 
dotted lines, respectively; the trigger motif (residues 79-91) that acts as a particularly 
stable region that “‘nucleates’’ coiled-coil formation; the site at residue 40 (indicated by 
an asterisk) of a six heptad insertion in lamin molecules. The entire segment displays a 
regular disposition of acidic and basic residues, each with a period of about 9.54 
residues. These periods are approximately out of phase with one another. 


sequence of D-A-L-M-D-E-I-N-F-M-K-M-F (K5 chain) and E-S-L-K- 
E-E-L-A-Y-L-K-K-N (K14 chain). Trigger motifs are believed to be 
especially important in providing stability for a coiled coil and, as a 
consequence, may act as a nucleating point for subsequent coiled-coil 
formation. In the case of cortexillin (Burkhard et al., 2000), stability was 
achieved through a network of intra- and interchain ionic interactions. 

Intrahelical ionic interactions are also important in stabilizing the 
individual a-helices in the IF molecule. There are a large number of 
conserved but oppositely charged residue pairs in segment 1B. These 
are all of the type 7 to i + 4, and are as follows: 42K/R-46E, 46E-50R, 
50R-54E, 61R/K-65D, 71R/K-75E, and 89K/R-93E/D (Fig. 4; Smith et al., 
2002). In all but one case, the ionic bonds form between a basic residue in 
position 2 and an acidic residue in position 7 + 4. 

The acidic and the basic residues are not randomly distributed in 
segment 1B. An analysis of their axial distributions using Fourier trans- 
form methods has revealed that both groupings display a very significant 
period of about 9.54 residues (1.42 nm). The periods are roughly, but not 
exactly, out of phase with one another (Crewther et al., 1983; McLachlan 
and Stewart, 1982; Parry et ol, 1977). The structural rationale for the 
alternating bands of positive and negative charge is self-evident: 1B seg- 
ments in different molecules will be in a position to interact strongly with 
one another if their zones of opposite charge become axially aligned 
(see Chapter 2, Fig. 6). Clearly, a family of closely related alignments 
is possible (Fraser et al, 1985, 1986). Detailed molecular modeling of 
the interactions between 1B segments in vimentin (Smith et al., 2003) 
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confirms that factors governing molecular alignment in the Aj; mode lie 
not only in charged interactions between 1B segments, but also with 
interactions between various other parts of the molecule (note, for exam- 
ple, the role of subdomain Hl in epidermal keratins; see Section II.A.1). 

X-ray studies on oriented films formed from isolated 1B segments from 
trichocyte keratin (wool) have revealed a typical coiled-coil diffraction 
pattern and axial period of 16 nm (Suzuki et al., 1973). The length of 
segment 1B is about 15 nm (101 x 0.1485 nm); hence, it was proposed 
that the relative stagger between oppositely directed 1B segments must be 
small (<15 residues). From crosslinking data subsequently obtained, the 
relative stagger was shown to lie in the 15 to 20 residue range (Wang et al., 
2000), a value in close agreement with the estimate derived from in vitro 
X-ray diffraction experiments. 


E. Linker L12 


Linker L12, a region lacking a heptad repeat, lies at the center of 
the rod domain. Its length varies, though not by a great deal, from one 
chain type to another. It is 16, 17, 16-18, 15-22, 19, and 19-21 residues 
long, respectively, for the vast majority of the Type I, II, HI, IV, V, and 
VI chains characterized thus far. In terms of physical projected length 
in the filament as determined from crosslinking studies (Parry et al., 
2001), this corresponds to 12.57h,, (1.87 nm) for trichocyte and epider- 
mal keratins (Type I and Type II), and about 11.99h.. (1.78 nm) for 
neurofilaments and a-internexin (Type III and Type IV). Within the 
limitations of the data, these values are the same. 

Little is known about the conformation of linker L12. In all cases except 
Type VI lamin molecules, it is predicted that the structure will not be a-helical. 
There is some similarity between chain types in their consensus sequences, 
and it follows that common structural features in L12 are likely across all chain 
types. The variations in sequence length could be taken up as external loops. A 
portion of the sequences do exhibit a conserved character in which an apolar- 
X motif is repeated four times (North et al., 1994). It is possible that this will 
form a short length of f-structure. Noting that there are two chains in the 
molecule and that antiparallel molecules overlap their L12 segments to a 
large degree, a four-chain (-sheet or equivalent structure could occur. The 
evidence as yet is weak and further experimental data will be required before 
this idea will gain significant credence. 

Electron microscope observations on individual IF molecules have 
shown that the molecules can bend, and even fold, over the L12 segment 
(Steven et al, 1989). This is consistent with its predicted flexibility 
(Conway et al., 1989; Smith et al., 2002). 
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Gill et al. (1990) have engineered additional residues into linker L12. 
The result of these studies was that relatively normal filament formation 
was able to proceed. This indicates (but does not prove) that additional 
residues, which allow flexibility to be maintained, can be tolerated without 
compromising filament formation. However, mutations in which flexibility 
is lost or decreased appear to lead directly to malformation of IFs. The 
flexibility index for L12 is less than for linker L1, but it is clearly still an 
important feature of the structure. 


F. Segment 2A 


Segment 2A is conserved in length (19 residues) across all chain types. 
It has an underlying heptad repeat characteristic of a coiled-coil structure, 
as well as an acidic and basic residue period (about 9.8 residues long) 
first reported by Crewther et al. (1983). This is less easily seen by consider- 
ing segment 2A alone (since it is so short), but is readily observed 
when segment 2A is considered as an extension to segment 2B (see 
Section II.H). Within segment 2A, there is a conserved basic residue in 
position 10 and a conserved acidic residue in position 14, thus allowing 
the formation of a conserved and potentially stabilizing intrahelical ionic 
interaction of the type i to i+ 4 (Smith et al., 2002). 

The mode of molecular assembly of K5/K14 epidermal keratin has been 
studied in detail by making point mutations of charged residues along 
the chains, and then studying the aggregation of equimolar mixtures of 
wild type and mutant chains (Mehrani et al., 2001). Using this technique, 
it has been possible to show that the conserved residue D1 and the less 
well-conserved D/N3 in segment 2A, and the conserved residue K31 
in segment IA, are all important for stability. These residues lie in close 
axial proximity in the A,ı mode, and it is speculated that they are involved 
in making stabilizing intermolecular ionic interactions and hydrogen 


bonds. 


G. Linker L2 


Linker L2 is eight residues long for all IF chains, except for nestin (Type 
VI) where it is absent (Lendahl et al., 1990). From crosslinking studies, the 
axially projected length can be determined (Parry and Steinert, 1999). For 
the Type I and Type II keratins, the projected length is about 4.77h.. 
(0.71 nm). For Type HI chains in vimentin IFs and Type IV chains in 
a-internexin (as well as their copolymerized assemblies), the projected 
lengths are about 7.79h... (1.16 nm). These data are limited and hence it is 
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likely that L2 will have a common projected length in all IF protein 
molecules of about 1 nm. 

As reported for segments 1A and 2A, point mutations were made in 
charged residues in K5/K14 epidermal keratin. The stabilities of the 
resulting aggregates were measured to ascertain, in particular, some 
of the more important residues involved in stabilizing the Ay; mode of 
molecular assembly. The results show that the conserved residues 4E and 
6E in linker L2 were essential for stability, and that residue D7 was also 
important (Mehrani et al., 2001). This group of residues lies immediately 
opposite to D/N9 and the conserved residue R10 in segment 1A. The 
residues were expected to interact favorably to provide the observed 
stability through both hydrogen bonds and intermolecular interactions. 

L2 represents the region over which the coiled-coil conformation under- 
goes a significant azimuthal change between that present in segment 2A 
and that found in segment 2B. The underlying heptad repeat suffers a 
large phase shift at L2, equivalent to the insertion of five residues or 
deletion of two residues in an otherwise continuous heptad structure. 
The position of the apolar stripe on the surface of the individual a-helices 
thus changes from being internal to external. As this cannot occur in vivo, 
it follows that there must be a significant remodeling of the coiled coil to 
allow the appropriate phasing of the heptad repeat to be restored. In 
effect, the result is that there must be a marked azimuthal shift between 
the C-terminal end of segment 2A and the N-terminal end of segment 2B. 
There is a wide variety of possible conformations that are consistent with 
this idea and the actual amino acid sequences (see, for example, Conway 
and Parry, 1990; North et al., 1994). 


H. Segment 2B 


The length of segment 2B is absolutely conserved (121 residues) in all IF 
chains, and over its entire range it exhibits a heptad substructure. How- 
ever, there is a conserved stutter close to the center point of segment 2B 
and this breaks the phasing of the heptad repeat (Fig. 5). Brown et al. 
(1996) have categorized all heptad discontinuities and shown that these 
fall into three significant groups—stutters, stammers, and skips. They corre- 
spond to deletions of three, four, and six residues respectively. Conforma- 
tionally, a stutter results in a local unwinding of the coiled coil such that the 
two strands lie approximately parallel to one another and to the axis of the 
molecule (Brown et al., 1996; Strelkov and Burkhard, 2002). The perfect 
conservation of the stutter in all IF proteins points clearly to its fundamental 
importance. In effect, it will allow some fine tuning of the azimuthal coordi- 
nate of the coiled coil to occur, thus facilitating appropriate molecular 
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Fic. 5. Schematic diagram of segment 2B in intermediate filament chains showing generally conserved features. 
These include the highly conserved residues K23, L103, E106, Y110, L113, and L114; the conserved intra- and 
interchain ionic interactions that are represented by solid and dotted lines, respectively; the trigger motif (residues 
100-113) that acts as a particularly stable region that ‘‘nucleates’’ coiled-coil formation; a region of particularly high 
hydropathy (residues 95-108); the helix termination motif (residues 92-121); a stutter in the heptad substructure at 
which point the coiled-coil structure unwinds locally. The entire segment displays a regular disposition of acidic and 
basic residues, each with a period of about 9.84 residues. These periods are approximately out of phase with one 
another. 
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interaction and assembly. Removal of the stutter in vimentin by inserting 
three residues to allow the heptad phasing to be continued uninterrupted has 
a negative effect on filament assembly, and only very short IF are formed 
in vitro (Herrmann and Aebi, 1998b). The conformational significance of the 
stutter is beyond question. 

The C-terminal end of segment 2B contains a very highly conserved 
sequence of the form E-Y-Q-X-L-L-D/N-V-K-X-R/A-L-D/E-X-E-I- 
A-T-Y-R-K/ R-L-L-E-G-E-E/D-X-R-L/N/I. This is known as the helix 
termination motif and encompasses residues 92-121. It can be regarded as 
the counterpart of the helix initiation motif described earlier in segment 
1A (Fig, 5). The termination motif is important for other reasons, too. 
First, a trigger motif has been found in epidermal keratin (K5/K14) IFs, 
and this spans residues 100-113 (Wu et al., 2000) within the helix ter- 
mination motif. Its sequence is A-L-L-D-V-E-I-A-T-Y-R-K-L in K5 and 
T-R-L-E-Q-E-I-A-T-Y-R-R-L in K14. Second, all of the most highly 
conserved residues in segment 2B across all IF types occur in the helix 
termination motif (Smith et al., 2002): the residues are L103, E106, Y110, 
L113, and L114. Third, the conserved intrachain ionic interactions of the 7 
to 7+ 4 type are all in this same region: they are 100K-104D/E and 111R- 
115E (Smith et al., 2002). Fourth, one of the two conserved interchain 
ionic interactions also lies in this region. This is between an acidic residue 
in position 106g and a basic residue in position 111e. The second is 
between an acidic residue in position 25g and a basic residue in position 
30e. (In passing, it should be noted that the sole conserved 7 to i + 3 
intrachain ionic interaction lies between 28E/D and 31R: Smith et al., 
2002). Fifth, the highest hydropathy calculated for IF chains occurs for 
residues 95-108 (Smith et al., 2002). Sixth, in the Aso mode of molecular 
interaction, Wu et al. (2000) have shown that E106 is important for 
stability, as is K23 in the same segment. These residues lie in close axial 
alignment in the Ass mode and are likely to form an ionic interaction. 

The helix termination motif may well be unique in containing so many 
special features of structural and functional importance in such a short 
length of sequence. Indeed, the experimental observation that the 
C-terminal end of segment 2B was a mutation hot spot that would lead 
to a large number of diseases (Parry and Steinert, 1999) was readily 
predictable on theoretical grounds. There are, as stated above, a multitude 
of key residues in this region that are involved in almost every aspect of the 
structure and assembly of IF molecules. 

One aspect of the assembly of IF molecules that has received scant 
attention thus far relates to the small head-to-tail overlap that occurs 
between similarly directed molecules. This is defined by mode Agcy. Its 
value is generally about 7 hec (7.03 hec or 1.04 nm in epidermal keratin, 
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5.67 hec or 0.84 nm in reduced trichocyte keratin, 4.31 hec or 0.64 nm in 
vimentin, and 10 h.. or 1.49 nm in a-internexin), although in oxidized 
trichocyte there is a head-to-tail gap of 8.43 hec or 1.25 nm (Parry, 1995; 
Parry et al., 2001; Steinert et al., 1993a,b,c, 1999a; Wang et al., 2000). 
Structural data on this region of segment 2B reveal that the a-helical 
strands have separated over the last ten residues (Herrmann et al., 
2000). Knowing also that the strands of segment 1A can likewise be 
structurally independent of one another (Strelkov et al., 2002), it is 
interesting to speculate that the overlap can, under assembly conditions, 
be conformationally akin to a four-a-helix motif with two similarly directed 
strands contributed by segment 1A and two oppositely directed strands 
contributed by segment 2B (Smith et al., 2002). A comparable structure 
has been considered by Carolyn Cohen for the head-to-tail overlap of 
similarly directed tropomyosin molecules. 

As in segment 1B, the linear distributions of the acidic and the basic 
residues are nonrandom. The statistically significant period determined by 
Fourier analysis (Parry and Fraser, 1985) is 9.84 hec (1.46 nm). An impor- 
tant point is that the difference in phasing between the acidic and the 
basic periods is close, but not equal, to 180 degrees. Thus, an appropriate 
family of axial staggers between two 2B segments (parallel or antiparallel) 
will necessarily facilitate intermolecular ionic interactions. However, anti- 
parallel alignments will be favored over parallel ones because more inter- 
actions are possible. This arises directly from the difference in phasing of 
the identical acidic and basic periods not being equal to 180 degrees. The 
Aso mode of interaction is largely specified by these types of interactions 
though the contributions of hydrogen bonds, and apolar interactions are 
also likely to be important. 


I. Tail Domain 


1. Epidermal Keratins (Type I/Type II) 


The epidermal keratins have a tail subdomain structure closely parallel 
to that described for the heads in Section II.A.1. Immediately C-terminal 
to the rod domain exists a stretch of highly conserved sequence across 
Type II chains (Steinert et al., 1985). This is termed the H2 subdomain and 
is 20 residues long. An equivalent conserved region in Type I chains, 
however, is totally absent. C-terminal to H2 lies a sequence that varies 
between chains within a particular type, but which is rich in glycine and 
serine residues. This V2 subdomain, present in both Type I and Type H 
chains, has a length that lies in the range 0-110 and 25-125 residues in the 
two chain types, respectively. Human Kl and K10 chains both show 
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polymorphism in their V2, but not V1, subdomains (Fig. 2b). Insertions 
and deletions are more prevalent in the Type II chains, but have also 
been recognized in the Type I chains (Korge et al., 1992a,b; Mischke, 1993; 
Parry and Steinert, 1999). The presence of a glycine loop conformation, 
of course, readily allows such polymorphism to occur without major 
structural implications. 


2. Trichocyte Keratins (Type I/Type I) 


The tail domain in trichocyte Type I chains contains a tenfold P-C-X 
repeat, seven of them contiguously (Parry and North, 1998). Studies of its 
likely conformation suggest that it will most likely adopt a left-handed 
polyglycine II structure with three residues per turn (Fig. 3a). The cyste- 
ine residues would thus lie along one edge of the structure, where they 
would be in positions to form disulphide bonds with cysteine residues in a 
similar structure from a different molecule. The role of this sequence 
motif would appear to be that of stabilizing molecular assembly within 
the IFs. 

The Type H trichocyte chain, however, contains a quite different struc- 
tural motif—a right-handed, four-stranded, twisted antiparallel -sheet 
(Fig. 3b). The tetrapeptides linking strands 1-2 and strands 2-3 are 
homologous and have sequences of S-S-S-R and S-G-S-R/A. Similarly, 
the tetrapeptides linking strands 3-4 and strand 4 with the remainder of 
the tail region have homologous sequences of A-P-C-S and A-P-C-G, 
respectively. In all four cases, the sequences are predicted to form a Gm 
structure. The (-strands are characterized by a two-residue apolar repeat, 
resulting in one face of the -sheet being almost totally apolar. The other 
face would contain the cysteine residues. This structure immediately 
suggests a role in vivo whereby the apolar face in the (-sheet of one 
molecule would interact with the same domain in a second molecule in 
the IFs. This would again lead to stabilization of the molecular aggregate 
within the filament. 


3. Desmin, Vimentin, Glial Fibrillary Acidic Protein, and Peripherin 
(Type III), Neurofilaments and a-Internexin (Type IV), Lamins (Type V), 
and Nestin (Type VI) 


Almost nothing is known of the role played by individual amino acid 
residues in the tail domains of Type III IF proteins. Also, very little 
sequence regularity has been observed. Nonetheless, the degree of se- 
quence similarity in the tail domains of these proteins suggests that a 
role analogous to that played by the H subdomains in the keratins is not 
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unlikely. In other words, the tails may be important in stabilizing the 
assembly of Type III IF proteins at the oligomeric stage, possibly at 
the two- to four-molecule level. 

In contrast to the lack of obvious subdomain structure in the Type III IF 
proteins, the organization of the tail domains in the Type IV IF proteins 
is well established, and consists of anumber of regions of special sequence 
and character (Fig. 6; Parry and Steinert, 1995). These include the glu- 
tamic acid rich region (E segment), the conserved lysine-glutamic acid 
region (KE segment), the lysine-serine-proline repeats (KSP segment), 
and the terminal lysine-glutamic acid-proline segments (KEP segment). The 
KSP repeats are the sites of phosphorylation, but the extent of these differs 
markedly between species (see, for example, the work of Napolitano ei al., 
1987 and Chin and Liem, 1990). The role of the E segment is believed to be 
important in IF assembly (Birkenberger and Ip, 1990). 

The tail domain of nestin contains 22-residue (human) and 44-residue 
repeats (hamster and rat) (Parry and Steinert, 1999; Steinert et al., 1999b). 
The 44-residue repeat of hamster is E-E-D-Q-L/R-V-E-R/T-L-I-E-K- 
E- G-Q-E-S-L-S-S-P-E and E-E-D-Q-E-T-D-R-P-L-E-K-E-N-G-E- 
P-L-K-P-V-E and, as divided here, can easily be seen to consist of a pair 
of related but nonidentical 22-residue sequences. Interestingly, even these 
22-residue repeats are quasi-halved, indicating an underlying 11-residue 
structure in all of the species studied. The secondary conformation is 
unknown, but there is an expectation of high a-helix content. 


4. Invertebrate IF 


Uniquely, the cephalocordate Cl and C2 chains both contain heptad 
repeats of overall length 64 residues in their tail domains (Riemer et al., 
1998). There is no direct evidence, however, of the role that these repeats 
play in vivo, but it is evident that such sequences are not present by 
chance. There can be no doubt that this repeat is designed to facilitate 
oligomerization between chains. 


J. Features in Head and Tail Domains 


The head and tail domains in keratin molecules generally contain a 
multitude of sites that allow keratin IFs to form covalent bonds with other 
proteins. For example, in the case of the keratin IFs in the inner root 
sheath, lysine to glutamine crosslinks have been characterized between the 
head and (more often) the tail domains in (generally) both Type I and 
Type II chains, envoplakin, epiplakin, the SPR2 (small proline-rich) fami- 
ly, and trichohyalin (Steinert et al., 2003). Likewise, in noninner root 
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Fic. 6. Schematic representation of the tail domains in Type IV intermediate 
filament chains. The rod domain, indicated by a shaded rectangle, does not show 
its known substructure. The four chains illustrated here are, from the top downwards, 
a-internexin, and the low (NF-L), medium (NF-M), and high molecular weight chains 
(NF-H) from neurofilaments. The characters of the segments are indicated by the one 
letter code. Figure is redrawn from Parry and Steinert (1995) and is based on an 
original by Shaw. 


sheath epidermal keratin chains, crosslinks have been identified between 
the lysine residue in the K-S-I-S-I-S motif in the V1 subdomain of the Type 
II head domain and glutamine residues in cystatin, elafin, envoplakin, 
involucrin, loricrin, and the SPRI, 2, and 3 families of the cornified 
cell envelope (Candi et al., 1998; Steinert and Marekov, 1995, 1997). It 
is interesting and pertinent that the lower crosslink density in these IFs 
probably arises in large part from the fact that only a single contribu- 
tory lysine site occurs, and in just one of the two chain types. In both of 
the examples listed here, the process is mediated by transglutaminase 
crosslinking. 

Parry and Steinert (1999) noted that posttranslational modifications 
were found almost (but not quite) exclusively in the head and tail domains 
of all IF proteins. Nothing in the five years since their review has changed 
that conclusion, though more examples of amino acid modification 
are now known. A selection of posttranslational modifications is noted 
below, but a comprehensive list has not been attempted and is consid- 
ered beyond the scope of this review. Useful references to posttranslation- 
al modifications include those of Quinlan et al. (1994), Parry and Steinert 
(1999), and Herrmann and Aebi (2000). 

A posttranslational modification of major significance in IFs relates to 
the phosphorylation and dephosphorylation events that occur at serine 
and/or threonine residues within specific recognition sites. These sites are 
located entirely in the head and tail domains. It has now been widely 
established that the action of kinases which lead to phosphorylation, and 
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phosphotases which cause dephosphorylation, are the crucial biochemical 
actions that regulate (or partially regulate) the assembly and the turnover 
of a wide variety of IFs in vivo. They also can modify the interactions 
between IFs and various cellular entities. For instance, during cell division 
the phosphorylation of some IFs (cytokeratins, vimentin, lamin) results 
in filament disassembly; the inverse event—dephosphorylation—results in 
spontaneous self-assembly to form intact filaments. 

In the medium and large neurofilament chains, where there are numer- 
ous K-S-P phosphorylation sites in the tail domains, the effect of phos- 
phorylation is quite different. It has no visible effect on the state of 
filament assembly. It does, however, appear to be particularly important 
in determining axonal diameter (and concomitant conduction velocity), 
as well as transport properties and association with other cytoskeletal 
components. Experimentally, numerous phosphorylation sites have been 
shown to exist in a wide variety of IF proteins. Many others have 
been proposed on the basis of sequence motifs consistent with sites of 
known kinases. It has also been shown that mutations in which phosphor- 
ylation sites have been changed (see, for example, S35A in keratin 19) lead 
to various pathologies, including malformations in the filament assembly. 

Other posttranslational modifications seen in IFs include proteolysis 
(cytokeratin, lamin, neurofilaments: Klymkowsky et al., 1991; Nigg, 1992) 
and, in the case of lamin, an isoprenylation event at the CaaX box. The 
latter, which is unique among IF chains, occurs at the C-terminal end of 
the chain (Hennekes and Nigg, 1994; Nigg, 1992) and has the effect 
of targeting lamin chains to the nuclear envelope. Another quite different 
posttranslational modification is that of mono-ADP-ribosylation, which 
appears to modify desmin assembly (Yuan et al., 1999). Glycosylation is 
yet another posttranslational modification, and a wide variety of IF chains 
have been reported as being glycosylated. These include some keratins, 
lamins, and some neurofilament chains. Glycosylation sites, like those of 
many other (but not all) posttranslational modifications, occur in both the 
head and tail domains (Quinlan et al., 1994). 


Ill. SUMMARY 


The quarter-century that elapsed since the first sequences of IF proteins 
were published and analyzed has seen a great deal of new information 
about the chain and molecular structure, as well as the modes by which 
the molecules assemble to form viable filaments. As the research has pro- 
gressed, it has become possible to identify the structural and/or functional 
role of various subdomains within the IF chains, and even of individual 
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amino acid residues. This review has attempted to categorize that knowl- 
edge, to identify residues that are highly conserved and which are of 
particular structural/functional importance, and to ascertain both intra- 
and interchain interactions that are especially crucial in either stabilizing 
or nucleating the coiled-coil structure. 

It is evident that specific residue-related features observed in the 
rod domain of one particular IF chain type are frequently not observed 
in the same place (if at all) in the other chain types and, of course, the 
large differences in head and tail structure and sequence between 
chain types preclude identical roles from occurring there as well. The 
highly specialized sequence characteristics thus define unique structur- 
al assemblies and functions, while still maintaining a high degree of 
structural uniformity, particularly in the manner of rod domain assem- 
bly. Even in this case, small but important differences in the Ajj, Ago, 
and Ajo modes occur. Distinct IF structures have been identified for: 
(1) unoxidized trichocyte keratin and epidermal keratin; (2) oxidized 
trichocyte keratin; (3) Type III and IV IF proteins; and (4) the nuclear 
lamins. 

Much more is required before we will have a detailed understanding of 
IF structure, but real progress has been made. We can realistically expect 
that the next ten years will see new studies completed that will add greatly 
to our knowledge. Crystal studies of IF protein fragments, now in progress, 
will add immeasurably to that understanding. 
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ABSTRACT 


Intermediate filament associated proteins (IFAPs) coordinate interac- 
tions between intermediate filaments (IFs) and other cytoskeletal elements 
and organelles, including membrane-associated junctions such as desmo- 
somes and hemidesmosomes in epithelial cells, costameres in striated 
muscle, and intercalated discs in cardiac muscle. IFAPs thus serve as 
critical connecting links in the IF scaffolding that organizes the cytoplasm 
and confers mechanical stability to cells and tissues. However, in recent 
years it has become apparent that IFAPs are not limited to structural 
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crosslinkers and bundlers but also include chaperones, enzymes, adapters, 
and receptors. IF networks can therefore be considered scaffolding 
upon which associated proteins are organized and regulated to control 
metabolic activities and maintain cell homeostasis. 


I. INTRODUCTION 


Intermediate filaments (IFs), along with actin-containing microfila- 
ments and microtubules, comprise the three major nonmuscle cell cyto- 
skeleton proteins. IFs fall into five major families that exhibit tissue 
specific expression patterns (Coulombe and Omary, 2002; Coulombe 
and Wong, 2004) (also see Chapter 5). Within the cell, IFs are associated 
with the nuclear surface and extend out into the cytoplasm, where they 
interact with other cytoskeletal tracks and cytoplasmic organelles such as 
mitochondria. They also impinge on the plasma membrane in certain 
tissues where they are tethered by ultrastructurally visible organelles and 
specialized membrane densities such as desmosomes and hemidesmo- 
somes in epithelial cells, costameres in striated muscle, and intercalated 
discs in cardiac muscle. IFs thus serve as integrators of cytoplasmic space 
within cells and of cells within tissues, consequently conferring mechanical 
stability to organs. 

Interactions between IFs and other cytoskeletal elements and organelles 
are coordinated by intermediate filament associated proteins (IFAPs); 
many of these proteins have emerged as critical regulators of tissue 
integrity in their own right (Coulombe and Wong, 2004; Fuchs and Yang, 
1999; Herrmann and Aebi, 2000; Houseweart and Cleveland, 1999). In 
addition, however, it is clear that IFAPs are not limited to crosslinkers and 
bundlers but also include chaperones, enzymes, adapters, and receptors 
whose cellular roles transcend that of structural proteins (Fig. 1). In fact, 
IF networks now might be considered as signaling scaffolds upon which 
associated proteins are organized and regulated to control metabolic 
activities and maintain cell homeostasis. Interactions between nuclear 
lamins and nuclear-associated proteins are not only required for the 
normal organization of the karyoskeleton, but have been implicated in 
regulating gene transcription and cell proliferation. This latter topic is 
sufficiently complex that we will not cover it in detail in this Chapter, but 
will refer the reader to other excellent recent reviews in this area (see 
Goldman et al., 2002; Herrmann and Foisner, 2003; Zastrow et al., 2004). 
Finally, in some cases, there is a fine line between IFs and IFAPs. Certain 
IFs harbor extended C-terminal tails that project from the surface of 
the filament, providing connecting arms that associate with neighboring 
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Diagram of intermediate filaments (IFs) and their associated proteins. 


Various IFs on the left are color- and shape-coded and matched with interacting 
partners on the right. For interactions of nuclear lamins with their associated proteins 
refer to (Zastrow et al., 2004). This figure is not meant to be comprehensive but is rather 
a summary of those interactions highlighted in this review. 
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filaments. In some cases, IF polypeptides that coassemble at substoichio- 
metric levels with partner proteins are thought to play critical roles in 
anchoring filaments to other structures and cellular membranes. In the 
current Chapter, we will focus on cytoplasmic IFs and their associated 
proteins; the latter are both organizers of IFs and targets for regulation by 
this highly diverse class of cytoskeletal polypeptides. 


Il. PLAKIN FAMILY 


One major family of IFAPs was first recognized in 1991 and subsequently 
emerged as a gene family providing crosslinking functions for all cytoskel- 
eton networks and the plasma membrane. These proteins are now collec- 
tively referred to as the plakin family (Jefferson et al., 2004; Leung et al., 
2002; Ruhrberg and Watt, 1997), a name which originally derived from the 
location of many of these proteins in the electron-dense cytoplasmic 
plaque of cytoskeletal anchoring junctions. It is now known that the 
extended family includes modular molecules built from building blocks 
that together comprise enormous proteins. Plakins can contain IF-bind- 
ing, actin-binding (ABD), coiled-coil rod, spectrin repeat-containing, and 
growth arrest specific protein 2 (GAS2) MT binding domains (GAR 
domain), as well as domains that target certain members to junctions such 
as desmosomes or hemidesmosomes (Jefferson et al., 2004) (Fig. 2). The 
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with a legend for each of the component domains. A single generic plectin isoform is 
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largest plakins contain spectrin repeats and have now been dubbed 
spectraplakins (Roper et al., 2002). 

The most ancient plakin genes, found in the invertebrates C. elegans and 
Drosophila, are called VAB-10A/B and SHORTSTOP (kakapo), respective- 
ly, and play important roles in adhesion and tissue morphogenesis. These 
primitive versions of the plakins are highly modular in character, suggest- 
ing that several of the more recent epithelial versions were streamlined for 
specialized functions, primarily as IF-binding cytoskeletal crosslinkers. 
Interestingly, Drosophila does not even have cytoplasmic IFs; the regions 
most closely resembling IF-binding domains have other functions in junc- 
tion localization (Roper and Brown, 2003), suggesting either that IF- 
binding functions were lost or that these domains were later tailored for 
IF binding. In this review, we will focus on those plakin family members 
that interact with IFs, beginning with the founding members of the family: 
desmoplakin, BPAGI, and plectin. We will then move on to discuss the 
more recently identified envoplakin and periplakin molecules, and the 
unusual epiplakin protein. 

The majority of plakins have been localized to anchoring junctions that 
mediate either intercellular or cell-substrate adhesion, known as desmo- 
somes and hemidesmosomes, respectively (Fig. 3) (for reviews refer to 
Borradori and Sonnenberg, 1999; Gaudry et al., 2001; Getsios et al., 2004; 
Godsel et al., 2004; Jones et al., 1998; Koster et al., 2004a). These junctions 
share a basic blueprint including transmembrane components that inter- 
act either with extracellular matrix molecules (hemidesmosomes) or 
transmembrane partners in adjacent cells (desmosomes). In hemidesmo- 
somes, the transmembrane components are members of the integrin 
family as well as BPAG2/BP180; in desmosomes, the desmosomal cadher- 
ins (desmogleins and desmocollins) serve this role. In each type of junc- 
tion, the cytoplasmic tails of these transmembrane molecules act as a 
scaffolding for the organization of a protein complex that anchors IFs 
to the plasma membrane. Specialized cell-cell and cell-substrate junctions 
in endothelial cells, which do not assemble desmosomes or hemidesmo- 
somes, have also recently emerged as important contact sites for IFs 
(Tsuruta and Jones, 2003; Vincent et al., 2004) (Fig. 3B, D). In each case, 
plakin family members serve as the IFAP component of the complex. 
Their critical role is underscored by the existence of human inherited 
diseases and engineered animal molecules that inactivate their tethering 
function, leading to tissue fragility diseases (Godsel et al., 2004; Jonkman, 
1999; Koster et al., 2004a; McMillan and Shimizu, 2001). In the following 
section, each of these anchoring junction-associated cytolinkers will be 
discussed. The reader is referred to recent reviews for details on other 
plakin family members (Jefferson et al., 2004). 


148 GREEN ET AL. 


3 Desmoplakin 


Keratin IF 


Vimentin IF 


Plectin 


Laminin-8/9 


Fic. 3. A comparison of cell-cell and cell-substrated anchoring junctions. (A) 
Desmosomes anchor keratin filaments through desmoplakin. One half of a 
desmosome, which is an intercellular junction that anchors intermediate filaments 
(IFs) to the plasma membrane, is shown. The transmembrane desmosomal cadherins, 
desmogleins (Dsg) and desmocollins (Dsc), mediate adhesion through their extracel- 
lular domains, and associate with plakophilins (Pkp) and plakoglobin (Pg) through 
their cytoplasmic domains. These proteins in turn interact with the N-terminus of the 
plakin family member desmoplakin, which anchors IF to the junction through its 
C-terminus. (B) Endothelial VE-cadherin-based junctions anchor vimentin through 
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A. Desmoplakin 


Desmoplakins (DP) I and II were first identified biochemically as abun- 
dant components of enriched fractions of desmosomes (Skerrow and 
Matoltsy, 1974) and were subsequently found to be ubiquitous constitu- 
ents of these organelles present in all epithelial tissues, heart, mengingeal 
cells, and follicular dendritic cells of lymph nodes. DP is also found 
outside of desmosomes in specialized endothelial junctions (Schmelz 
and Franke, 1993; Schwarz et al., 1990; Vincent et al., 2004). DP I and II 
are encoded by alternatively spliced products derived from a single gene 
that gives rise to 332 and 260 kDa proteins, respectively. DPII is missing a 
599-amino acid region of the rod domain, but contains N- and C-termini 
identical to DPI, and appears to have a more limited tissue distribution 
(Angst et al., 1990). Predictions from sequence analysis as well as empiri- 
cal data suggest that DP is a dumbbell-shaped molecule with a central 
a-helical coiled-coiled rod domain, flanked by globular N- and C-terminal 
domains (Fig. 2) (Green et al., 1990, 1992a,b; O’Keefe et al., 1989; Virata 
et al., 1992). This basic structure is found in several other plakins, includ- 
ing the 230 kDa hemidesmosomal protein bullous pemphigoid antigen 
(BPAG1) and the cytoskeletal linking protein, plectin (Green et al., 1992b; 
Sawamura et al., 1991; Wiche et al., 1991), as well as the cell envelope and 
junction proteins envoplakin and periplakin (Ruhrberg and Watt, 1997; 
Ruhrberg et al, 1996, 1997) (Fig. 2). A more recently described family 
member, epiplakin, contains an extended region homologous to the 
C-terminus of other plakins but lacks the N-terminal and rod domains. 

Analysis of DP cDNA sequence demonstrated that the DP C-terminus 
comprises three subdomains each made up of 4.5 copies of tandemly 


desmoplakin. Instead of desmosomes, endothelial cells (Vincent et al, 2004) have 
specialized intercellular junctions that anchor both actin and vimentin IFs to VE- 
cadherin-based plasma membrane domains. Vimentin interactions are mediated by 
desmoplakin, which associates with the VE-cadherin tail via the armadillo proteins 
plakoglobin and p0071. (C) Hemidesmosomes anchor keratin filaments through 
plectin and BPAGI. In epithelial cells, hemidesmosomes anchor keratin IFs to the cell 
substratum. Transmembrane a6ß4 integrin (the receptor for laminin 5) and BPAG2/ 
BP180 associate with the plakin family members plectin and BPAGI, which both tether 
keratin to the plaque through their C-terminal IF-binding domains. The tetraspanin 
CD151 is thought to play a key role in the assembly of these structures (Nievers et al., 
1999). (D) Vimentin-anchoring contacts in endothelial cells. Instead of hemidesmo- 
somes, endothelial cells have specialized focal contact-like structures that anchor 
vimentin to avß3 integrin complexes at the cell substratum. Although the precise 
organization of these structures has not been elucidated, it is likely that the plakin 
family member plectin anchors IFs to these junctions, which also contain proteins such 
as vinculin that are found in actin-based focal contacts. 
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organized 38-residue repeats with intervening or linking segments (Green 
et al., 1990). It was hypothesized, based on the periodicity of charged 
residues in these domains, that the DP C-terminus interacts with IFs. This 
prediction was borne out by cell culture, in vitro overlay, and yeast two 
hybrid (Y2H) approaches (Bornslaeger et al., 1996; Kouklis et al., 1994; 
Meng et al., 1997; Stappenbeck and Green, 1992; Stappenbeck et al., 
1993, 1994). More recently, X-ray crystallographic analysis of individual 
subdomains confirmed that each DP repeat (referred to as a plakin 
or plectin repeat, or PR) has a substructure 38 residues in length. Each 
PR comprises an 11-residue beta-hairpin followed by two antiparallel 
a-helices, representing a novel structural fold. Rather than forming an 
extended structure typical of other well-known repeating structures, the 
individual repeats fold to form a discrete globular structure called 
the plakin repeat domain (PRD). The structures of the different PRDs 
are similar, and the authors speculate that the PRDs in plakin family 
members are independent units organized like beads on a string. 

The C-terminal region containing these PRDs has been shown to asso- 
ciate with IFs from the range of tissues in which desmosomes are found, 
including simple and complex epithelial keratins, vimentin, and desmin. 
The precise mechanism by which these interactions occur is still a matter 
of debate, however. Based on structural studies cited above, it was pro- 
posed that conserved charged residues on the surface of the PRD could be 
involved in functional interactions with IFs. Indeed, each of the PRD 
domains was shown by cosedimentation analysis to exhibit low affinity 
binding to vimentin (Choi et al., 2002), and addition of the highly con- 
served linker did not detectably increase this association. However, anoth- 
er study demonstrated that the intervening highly conserved linker 
downstream of the “B” PRD is critical for binding to vimentin and the 
simple epithelial keratins K8/18. The latter finding is in line with the 
observed importance of this region in other plakin members for binding 
IFs (Nikolic et al., 1996). Unfortunately, the linker region was not part of 
the crystal structure and future studies will be required to sort out the 
relative importance of these different sequences. 

Studies have also pointed out that there is sequence specificity in DP 
interactions with various IFs (Fontao et al., 2003; Meng et al., 1997; 
Stappenbeck et al., 1993). Although the details of these studies differ, 
collectively they highlight the importance of the "CT PRD plus the last 
68 residues for DP’s association with keratins and the “B” PRD plus linker 
for type HI IF interactions. In addition, phosphorylation of serine 2849, 
which is 23 residues from the DP C-terminus, impairs interactions with 
IFs and may be involved in regulating the size of the DP pool that is 
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competent to incorporate into assembling desmosomes (Fontao et al., 
2003; Meng et al., 1997; Stappenbeck et al., 1994). 

The identity of IF sequences required for mediating association with the 
DP C-terminus is also a matter of some debate. Earlier in vitro overlay and 
yeast two hybrid analysis suggested that the presence of a single type II 
epidermal keratin, either K5 (Kouklis et al, 1994) or Kl (Meng et al., 
1997), is sufficient for mediating an association with DP. A region com- 
prising the KSIS motif present in the nonalphahelical head domain of the 
type I epidermal keratin K5 was implicated as critical for this interaction 
(Kouklis et al., 1994). The later studies suggested that in the case of K8/18, 
both partners must be present for a productive interaction with DP, and 
implicate the importance of the rod in this association (Meng et al., 1997). 
However, more recent Y3H analysis suggest that the presence of the 
keratin heterodimer is critical, not just for K8/18 but also for K5/K14, 
thus pinpointing the rod domain as the most likely partner for DP (Fontao 
et al, 2003). Together, these results support the idea that the DP 
C-terminus utilizes distinct sets of overlapping sequences to tailor binding 
to cell type specific IFs. 

The DP N-terminus is required for incorporation into the desmosomal 
plaque (Bornslaeger et al., 1996; Smith and Fuchs, 1998; Stappenbeck et al., 
1993). It contains a region that is conserved throughout all plakin family 
members, with the exception of epiplakin, now referred to as the plakin 
domain. This domain was originally predicted to form a globular structure, 
characterized by heptad repeats forming a series of antiparallel bundles. 
More recently, this prediction has been refined to describe a series of 
spectrin repeats surrounding a putative SH3 domain (Jefferson et al., 
2004). Coimmunoprecipitation, in vitro binding, and yeast two hybrid 
studies have demonstrated that the plakin domain contains binding 
sites for the armadillo proteins plakoglobin, plakophilins 1-3, and 
p0071, and may also bind directly to the desmosomal cadherins desmo- 
glein 1 and desmocollin 1. The multiplicity of possible interactions be- 
tween the DP N-terminus and other junction components provides a 
mechanism for finetuning the architecture—and accordingly the func- 
tion—of the junctional plaque (Bonne et al., 2003; Bornslaeger et al., 2001; 
Calkins et al., 2003; Chen et al., 2002; Hatzfeld et al., 2000, 2003; Kowalczyk 
et ol. 1997, 1999; Smith and Fuchs, 1998; Troyanovsky et al., 1993). High 
resolution immunogold localization analysis, using antibodies directed 
specifically against the N- and C-termini of DP, is consistent with their 
functions in membrane targeting and IF anchoring, respectively, showing 
that the N-terminus is located closer to the plasma membrane and the 
C-terminus is deeper in the cytoplasm where IFs are anchored to the 
plaque (North et al., 1999). 
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The critical importance of DP throughout development and in adult 
epidermis and heart is underscored by the existence of both naturally 
occurring mutations in humans and engineered mutations in mice (for 
review see Godsel et al., 2004). Complete loss of DP results in an embryonic 
lethal phenotype at day 6.5 and characterization of these embryos revealed 
defects in desmosome assembly and association of K8/18 in the embryonic 
trophectoderm (Gallicano et al., 1998). Animals in which extraembryonic 
tissues were rescued did survive for several days longer, but eventually 
succumbed due to defects in heart, neuroepithelium, epidermis, and 
microvasculature, suggesting the widespread importance of DP in main- 
taining tissue integrity (Gallicano et al., 2001). Conditional knockout of DP 
in mouse skin, along with a variety of mutations in humans, results in skin 
fragility defects sometimes accompanied by cardiomyopathy (Alcalai ei al., 
2003; Armstrong et al., 1999; Norgett et al., 2000; Rampazzo et al., 2002; 
Vasioukhin et al., 2001). Although the underlying cellular basis for these 
tissue defects is still under active study, it has been shown that an engi- 
neered mutation uncoupling IFs from desmosomes dramatically decreases 
the mechanical strength of keratinocyte cell sheets in vitro. Less severe, 
naturally occurring truncations of the DP C-terminus also lead to 
decreases in intercellular adhesive strength (Huen et al., 2002). 


B. BP230 (BPAGI) and its Isoforms 


BPAGI was first identified as a 230 kDa autoantigen in bullous pemphi- 
goid (Stanley et al., 1981), an autoimmune disease characterized by the 
appearance of large skin blisters. It was subsequently described as a 
component of hemidesmosomes (Westgate et al., 1985). Together with 
plectin, BP230 mediates keratin filament association to these cell-substrate 
anchors (for review see Jones et al., 1998; Koster et al., 2004a; Nievers et al., 
1999). Like DP, BPAGI performs its tethering function by binding to 
membrane receptors through its N-terminus—in this case, the laminin-5 
receptor opd integrin and the transmembrane glycoprotein BP180 
(Hopkinson and Jones, 2000; Koster et al., 2003)—and to IF through its 
C-terminus (Fontao et al., 2003; Leung et al., 2001b; Yang et al, 1996) 
(Fig. 3C). 

Due to its prominent expression in hemidesmosomes, it was predicted 
that knockout of BPAGI would result in skin blistering defects in the skin 
of mice. Indeed, focal blistering in response to stress was observed in mice 
lacking BPAG1/BP230 (Guo et al., 1995). Surprisingly, however, loss of 
the BPAGI locus either by engineered (Guo et al., 1995) or spontane- 
ously occurring mutations in the dystonia musculorum (dt) mouse (Brown 
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et al., 1995) also led to degeneration of sensory neurons and progressive 
dystonia in mice. 

This provocative phenotype led to the discovery that the BPAGI locus 
was larger than originally thought (Sawamura et al., 1990). It encodes 
additional modules characteristic of the larger plakin family members, 
giving rise to four distinct isoforms (Fig. 2). The neuronal defects ob- 
served in the mutant mice were recently ascribed to loss of BPAGI-a 
(Leung et al., 2001b), a giant isoform containing calponin homology 
(CH) domains 1 and 2 (comprising an actin binding domain) whose 
expression in skeletal muscle may also explain muscle abnormalities 
observed in dt mice (Dalpe et al., 1999). Somewhat surprisingly, until very 
recently no corresponding human mutations in BPAGI had been identi- 
fied. In a recent report, however, patients with a possible BPAG1 haploin- 
sufficiency with selective disruption of muscle and brain-specific isoforms 
have been characterized with esophageal atresia and psychomotor 
retardation (Giorda et al., 2004). 

Four known isoforms expressed from the BPAGI locus result from 
differential splicing (Leung et al., 200la; Okumura et al., 2002; Yang et al., 
1999). The common structural component of all isoforms is the plakin 
domain, which mediates binding of BP230 (also known as BPAGI-e) 
to BP180 (collagen XVII or BPAG2) (Hopkinson and Jones, 2000) and 
84 integrin (Hopkinson and Jones, 2000; Koster et al., 2003). The plakin 
domain in both BPAGI-e and BPAGI-n is followed by a coiled-coil rod do- 
main and two plakin repeat domains B and C, which are connected by a 
linker subdomain. Like the giant isoforms BPAGI-a (619 kDa) and 
BPAGI-b (824 kDa), BPAGIn comprises an actin binding domain consist- 
ing of two calponin-homology (CH) domains in its outermost N-terminus. 
Other features of the a and b isoforms are a spectrin repeat-containing rod 
domain and a microtubule binding domain at the very C-terminus. 
BPAGI-b, which is expressed in heart and skeletal muscle, also contains 
a plakin repeat domain, and could theoretically serve as integrator of all 
three filament types. 

The plakin repeat domains of BPAG] were shown to bind to and 
colocalize with IF; however, conflicting reports exist about its specificity 
for different IF types. In one case, data suggested that BPAGI-n colocalizes 
with neurofilaments, and the BPAGI plakin repeat domain immunopre- 
cipitated with NF-protein (Yang et al., 1996). Another group reported 
that the type III neuronal IF protein peripherin interacts with the BPAGI 
C-terminus (Leung et al., 1999). The potential ability of the PR containing 
domain in the “‘b’’ form to bind IFs awaits confirmation (Jefferson et al., 
2004). Recently, yeast three hybrid experiments showed an interaction 
between the C-terminal domain and the keratin 5/14 heterodimer, but 
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not keratin 8/18 or vimentin (Fontao et al., 2003), in contrast with the 
observed broader spectrum of DP binding partners. This difference in 
the repertoire of BPAG1l and DP partners most likely reflects the wider 
array of IFs that DP associates with in vivo in the desmosomes of heart, 
meninges, and simple and complex epithelia. 


C. Plectin 


Plectin and IFAP300 were identified independently as components of 
high salt/Triton X-100 insoluble, IF-rich preparations derived from cul- 
tured cells and tissues (Lieska et al., 1985; Pytela and Wiche, 1980; Wiche 
et al., 1982; Yang et al, 1985). Along with hemidesmosomal protein 1 
(HD1), they have all recently been shown to be identical proteins (Clubb 
et al., 2000; Herrmann and Wiche, 1987; Okumura et al., 1999). Plectin is 
present in a wide range of tissues and cells (Wiche, 1989; Wiche et al., 
1984, 1983). This versatile plakin functions as a connector between all of 
the nonmuscle cytoskeletal networks, and as an anchoring protein at 
plasma membrane specializations including hemidesmosomes, desmo- 
somes, intercalated discs, focal contacts, costameres, and Z-lines of striated 
muscle sarcomeres (Wiche et al., 1983). In complex epithelia, plectin is 
concentrated in basal layer cells, reflecting its localization in hemidesmo- 
somes; its function has been clearly demonstrated by engineered and 
naturally occurring mutations that affect hemidesmosome integrity. But 
plectin has also been shown to be a component of desmosomes (Skalli 
et al., 1994), where its presence may be due in part to demonstrated 
interactions with its fellow plakin member, desmoplakin (Eger et al., 
1997). The functional significance of plectin in desmosomes is not as well 
understood, and patients with plectin mutations do not exhibit obvious 
desmosome defects. 

The cDNAs of human (Liu et al., 1996; McLean et al., 1996) and rat 
(Elliott et al., 1997; Wiche et al., 1991) plectin each encode an approxi- 
mately 4600 aa long polypeptide with a predicted molecular mass of over 
500 kDa. Structural studies indicate that (like other epithelial plakin 
family members) plectin consists of two terminal globular domains and 
a central a-helical coiled-coil rod, and appears as a dumbbell in electron 
microscopic preparations (Foisner and Wiche, 1987; Wiche, 1989, 1998). 
Its central alphahelical coiled-coil rod is flanked by an N-terminus com- 
prising CH1, two ABD modules, and a plakin domain, and a C-terminus 
consisting of six PRD domains containing the conserved linker down- 
stream of the fifth PRD. Like DP, plectin contains a terminal GSR domain 
that is highly conserved between these two plakin family members. This 
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region in plectin, but not DP, binds to MT (Sun et al., 2001). The fact that 
plectin has the ability to associate with actin, MT, and IFs has led to the 
idea that it functionally integrates all three structural components of the 
cytoskeleton in cells, a concept dramatically illustrated by electron micro- 
scopical analysis of plectin “‘sidearms’’ physically bridging these elements 
(Foisner et al., 1995; Svitkina et al., 1996; Wiche, 1998). 

A number of studies have dissected the role of specific sequences in 
conferring cytoskeletal linking functions to plectin. Plectin’s C-terminal 
domain is built from six PRDs and includes the conserved linker region 
downstream of the fifth repeat. This linker comprises binding sites for 
vimentin and K8/18, as shown by in vitro binding assays and colocalization 
of wild-type and mutant protein fragments in transfected cells (Nikolic 
et al., 1996). Plectin also binds to GFAP, all three neurofilament proteins 
(Foisner et al., 1988), and lamin B (Foisner et al., 1991). Like DP, phos- 
phorylation of plectin regulates association with various IFs. Phosphoryla- 
tion by protein kinases A and C leads to decreased lamin B binding, 
whereas interaction with vimentin was enhanced by PKA phosphorylation 
of plectin (Foisner et al., 1991). 

The N-terminal plectin ABD consists of two calponin-homology do- 
mains, each comprising four primary a-helices connected by loops and 
shorter helices. This functional actin-binding site exhibits the same affinity 
for actin as that of other similar actin-binding proteins (Andrä et al., 1998; 
Fontao et al., 2001; Geerts et al., 1999). The N-terminal actin binding and 
plakin domains are also involved in binding to (4 integrin, an association 
that competes for actin binding (Geerts et al., 1999; Koster et al., 2004b). 
An additional binding site for $4 integrin is found in the final PRD 
(Rezniczek et al., 1998). X-ray crystallographic analysis of the plectin 
ABD revealed that it is folded similarly to other ABDs (Garcia-Alvarez 
et al., 2003; Sevcik et al., 2004). The structure formed by two calponin 
homology domains exhibits a closed structure that undergoes a confor- 
mational change upon binding of actin, but not ß4 integrin. This latter 
observation may help explain how plectin competitively binds F-actin and 
a6G4 (Garcia-Alvarez et al., 2003). 

The study by Sevcik et al. (2004) also reported an additional vimentin 
binding site in the first CH-domain of plectin. This region shows high 
homology to fimbrin, another CH domain containing protein that associ- 
ates with IFs, as described in more detail below. Unlike the C-terminal IF 
binding site, this N-terminal site does not bind to filamentous vimentin as 
judged by cosedimentation experiments (Sevcik et al., 2004). Based on 
these data, the authors propose that this additional N-terminal IF binding 
site of plectin may regulate dynamics, rather than stabilization of the IF 
network. 
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The plectin gene contains more than 40 exons encompassing over 32 kb 
on chromosome 15, generating an unusual 5’ transcript complexity and 
giving rise to many plectin isoforms (Rezniczek et al, 2003). Sixteen 
alternatively spliced exons have been identified, 11 of them directly 
spliced into exon 2 encoding the ABD (Fuchs et al, 1999; Rezniczek 
et al., 2003). However, in several cases, differential splicing results in 
retention of only one CH domain. An isoform lacking exon 31, which 
encodes most of the rod domain, was found for rat (Elliott et al., 1997) and 
is presumed for human plectin (Schröder et al., 2000). These isoforms 
show tissue, cell type, and developmental stage variation in expression 
that presumably reflects their multiple and diverse functions (Elliott ei al., 
1997; Fuchs et al., 1999; Rezniczek et al., 2003). On the cellular level, some 
of the isoforms align with actin stress fibers when expressed in cultured 
cells (Rezniczek et al., 2003); others show divergent patterns. For example, 
the recombinant plectin isoform 1f was shown to localize to focal contacts 
(Rezniczek et al., 2003). Thus, it could serve as a potential anchor of IFs 
to the plasma membrane at the site of focal contacts in fibroblasts 
and, possibly, endothelial cells (Gonzales et al., 2001; Seifert et al., 1992) 
(Fig. 3D). 

As described above, both the N-terminal plakin and C-terminal domains 
of plectin bind to the cytoplasmic tail of the 54 integrin component of the 
hemidesmosome (Rezniczek et al., 1998). The importance of plectin to 
the structure and function of the hemidesmosome is indicated by the 
consequences of its loss in both humans and mice. Although plectin is 
dispensable for the formation of hemidesmosomes per se, in skin of 
plectin (—/—) mice ablation of plectin results in skin blistering due to 
hemidesmosome perturbation (Andrä et al., 1997; Pulkkinen et al., 1996; 
Uitto et al., 1996). The plectin isoform la was shown to localize to hemi- 
desmosomes in rat skin preparations (Rezniczek et al., 1998) and to rescue 
hemidesmosome-like anchoring contacts in plectin-deficient keratinocytes 
(Andrä et al., 2003). Moreover, mice and humans lacking plectin also 
exhibit muscle deficiencies. This is not surprising, since plectin is a 
component of Z-disks of striated muscle, dense plaques of smooth muscle 
cells, and intercalated disks of cardiac muscle (Wiche et al., 1983). Indeed, 
in striated muscle, plectin serves not only as a linker of desmin to the 
Z-disks (Hijikata et al., 1999), but also plays a role in extending the desmin 
IF network to the costameric dense plaques (Hijikata et al., 2003) where it 
may bind to vinculin through actin. Interestingly, Fuchs et al. (1999) found 
that the isoform prevailing in muscle, which contains the differentially 
spliced exon 2a, has greater actin binding activity than the isoform lacking 
this small exon. 
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Most recently, it has been proposed that plectin may function as a 
signaling scaffold. A yeast two hybrid screen uncovered the PKC anchoring 
protein, RACKI, as a novel interacting protein. The authors went on to 
show that EGF stimulates the formation of a plectin-RACKI-PKC complex, 
coinciding with a redistribution of RACK from the cytoskeleton fraction to 
the soluble pool (Osmanagic-Myers and Wiche, 2004). In plectin null cells, 
RACKI was mislocalized, exhibiting a reduction in cytoskeletal association. 
The authors concluded that plectin may function to sequester RACK 
on the cytoskeleton in uninduced cells and then to shuttle it to other 
locations, possibly to membrane sites, upon EGF induction. 


D. Envoplakin and Periplakın 


Envoplakin and periplakin were first identified as 210 and 195 kDa 
transglutaminase substrates that were expressed at higher levels during 
keratinocyte differentiation, and became crosslinked into the assembling 
cornified cell envelope (Simon and Green, 1984). Both envoplakin and 
periplakin were originally reported as present in complex epithelial tissues 
but not most simple epithelia, mesenchymal tissues, or the heart. More 
recent studies suggest that periplakin is expressed more widely in a 
number of simple epithelia as well (Kazerounian et al., 2002). 

Like other epithelial members of the plakin family, envoplakin and 
periplakin are built from a plakin-domain containing N-terminus, a cen- 
tral alphahelical coiled-coiled rod domain, and a C-terminus (Ruhrberg 
and Watt, 1997; Ruhrberg et al., 1996, 1997) (Fig. 2). In contrast to other 
plakin family members, both envoplakin and periplakin have relatively 
short C-termini. Envoplakin contains the so-called linker domain, followed 
by a single PRD and short tail, whereas periplakin contains only the 
“linker.” Also in contrast to other plakin family members, the distribution 
of acidic and basic residues in the rod domains of envoplakin and peri- 
plakin suggest that they may not only be capable of homodimerizing, but 
also heterodimerizing. This prediction was supported by coimmunopreci- 
pitation studies showing that the proteins exist in a complex (DiColandrea 
et ol, 2000). Recent biochemical analysis revealed that envoplakin and 
periplakin form soluble complexes with equimolar stoichiometry, and that 
envoplakin requires the presence of periplakin for its solubility. Further 
chemical crosslinking analysis and electron microscopical analysis showed 
that the primary soluble form of the envoplakin/periplakin rod domains 
is a dimer, and that full-length proteins form higher-order oligomers 
(Kalinin et al., 2004). 
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Both envoplakin and periplakin have been localized to desmosomes, 
but their distribution is different from other desmosome markers in that 
both envoplakin and periplakin radiate outwards underneath the plasma 
membrane from the plaque proper. Periplakin and envoplakin have been 
shown to participate in the scaffolding of the cornified envelope, becom- 
ing crosslinked by transglutaminases to other cell envelope and desmo- 
some proteins (Steinert and Marekov, 1999) as well as lipids (Marekov and 
Steinert, 1998). Both periplakin and envoplakin/periplakin oligomers 
bind synthetic lipid vesicles, whose composition is similar to the cytoplas- 
mic face of eukaryotic cell plasma membranes. These data support the 
purported role of these plakin family members as scaffolding proteins for 
cornified cell envelope assembly (Kalinin et al., 2004). 

In spite of their shorter C-termini, envoplakin and periplakin appear to 
associate with IFs when coexpressed in cells (DiColandrea et al., 2000), a 
property that depends on their linker domains. Subsequent Y2H and 
overlay studies support a role for the periplakin (but not envoplakin) 
linker, in direct association with K8 and vimentin (Karashima and Watt, 
2002; Kazerounian et al., 2002). It has been proposed that periplakin may 
stabilize envoplakin’s interaction with the IF network (Karashima and 
Watt, 2002). Periplakin may also be necessary for localizing envoplakin 
to cell-cell contact sites, and in addition, appears to harbor a domain in 
its N-terminus that facilitates association with the actin cytoskeleton 
(DiColandrea et al., 2000). As neither envoplakin nor periplakin harbor 
a CH-like ABD in their N-termini, the question remains as to how this 
association is mediated. However, a recent study has identified kazrin as a 
novel periplakin-interacting protein whose binding site maps to the peri- 
plakin N-terminus. Kazrin’s distribution overlaps with that of desmosomes, 
but it is also associated with the membrane cortex in other interdesmoso- 
mal regions, and thus seems like a good candidate for participating in the 
anchorage of periplakin to cortical actin (Groot et al., 2004). 

Like other plakin family members, envoplakin and periplakin autoanti- 
bodies are also present in patients with the autoimmune disease para- 
neoplastic pemphigus, although it is not considered that this component 
is pathogenic in the disease (Kiyokawa et al., 1998; Mahoney et al., 1998). 
Whereas mice lacking periplakin exhibited no detectable phenotype 
(Aho et al., 2004), ablation of the envoplakin gene led to a relatively 
minor phenotype, including a delay in cornified envelope formation 
and acquisition of barrier function (Maatta et al, 2001). This mild 
phenotype is in striking contrast to that observed in animals deficient in 
desmoplakin or plectin; therefore, it seems likely that envoplakin and 
periplakin work coordinately with other proteins that play compensatory 
roles in the knock out animals. 
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E. Epiplakın 

Epiplakin is an unusual plakin family member encoded by a single 
remarkably large exon (>20 kb) giving rise to a protein consisting of 16 
PRD domains linked by intervening sequences that do not harbor a 
conserved linker sequence (Spazierer et al., 2003).The protein was origi- 
nally identified by autoantibodies in patients with an autoimmune skin 
disease (Fujiwara et al., 1996) and appears to be restricted to epithelial 
tissues (Fujiwara et al., 2001; Spazierer et al., 2003). Although it has been 
purported as having a role in bundling IFs, and this is an appealing 
speculation based on its unusual structure, very little has been done thus 
far to characterize the function of this newest plakin family member. 


Ill. CELL SURFACE-IF INTERACTIONS 


Plakins are not the only proteins that tether IFs to the cell surface. A 
number of other proteins localized to cell surface structures including 
desmosomes, focal contacts, and muscle costameres also contribute to IF 
anchorage at plasma membranes. IFs may also associate with cell surface 
receptors outside of ultrastructurally distinct structures, a topic that will be 
dealt with below. 


A. Proteins in Desmosomes 


1. Armadillo Proteins: Plakophilins and Plakoglobin 


Plakophilins and plakoglobin are both members of the larger armadillo 
gene family of dual junctional and signaling molecules. This family is 
named after the Drosophila segment polarity gene armadillo, later found 
to be the invertebrate orthologue of (-catenin (Peifer et al., 1992). This 
latter protein not only plays a key role as an adapter in adherens junctions 
but also as a downstream effector in the Wnt/wingless signaling pathway, 
which governs developmental patterning in the fly and cell fate decisions 
in vertebrates (Nelson and Nusse, 2004). Its central domain consists of a 
series of 12 imperfect repeats, termed armadillo repeats, that provide scaf- 
folding for interaction with binding partners involved in both junction 
structure and transcriptional regulation. In addition to G-catenin, mem- 
bers of this family include the junctional molecules plakoglobin, p120 
catenin, and plakophilins (Anastasiadis and Reynolds, 2000; Hatzfeld, 
1999) (Fig. 3). 

Plakoglobin is $-catenin’s nearest relative, and is a constituent of both 
adherens junctions and desmosomes (Zhurinsky et al., 2000). Its central 
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armadillo domain binds to both classic and desmosomal cadherins, and its 
shorter flanking N- and C-terminal domains serve regulatory functions. 
The C-terminus in particular appears to control desmosome size, as its 
deletion results in formation of laterally extended desmosomes. This 
region of plakoglobin also contains a number of phosphorylation sites 
that govern protein interactions (Gaudry et al., 2001; Miravet et al., 2003; 
Palka and Green, 1997). Plakoglobin is not generally considered to be a 
major direct binding partner for IFs, but rather an indirect link through 
interaction of its armadillo repeats with the IFAP desmoplakin. However, 
one report suggested that plakoglobin binds weakly to epidermal keratins, 
as shown by in vitro overlay assays (Smith and Fuchs, 1998) and thus in 
certain cases it may contribute more directly to IF binding. 

More substantial data supports an interaction between plakophilins 
(PKPs) and IFs. This subfamily of the armadillo family is most similar to 
the adherens junction protein p120catenin in its structure and organiza- 
tion of central armadillo repeats. There are three major desmosomal PKPs 
(1-3) (reviewed in Anastasiadis and Reynolds, 2000; Hatzfeld, 1999). The 
PKPs exhibit cell type specific differences in their tissue distribution and 
protein interaction partners, but collectively they associate with most 
desmosome proteins including plakoglobin, desmoplakin, and the desmo- 
somal cadherin tails (for review see Godsel et al., 2004). Like other 
armadillo family members, the PKPs localize both to junctions and the 
nucleus, although their function in the nucleus is not well understood. 
Unlike plakoglobin, all of the binding sites for desmosome proteins 
identified so far lie in the extended N-terminal domain, rather than the 
central armadillo domain (Bonne et al., 2003; Chen et al., 2002; Hatzfeld 
et al., 2000; Kowalczyk et al., 1999). 

PKP1 (formerly known as Band 6 of desmosome-enriched preparations) 
was first identified biochemically as a keratin-binding protein (Kapprell 
et al., 1988) and later identified by sequence analysis to be a member of the 
armadillo family (Hatzfeld et al., 1994). More recent studies using yeast two 
hybrid and in vitro solid assays demonstrated that PKPla and PKP2a bind 
to simple and epidermal keratins and, to some extent, vimentin (Hatzfeld 
et al., 2000; Hofmann et al., 2000). PKP1 in particular binds avidly to IFs 
assembled in vitro from CKs 8/18, 5/14, vimentin, or desmin and facil- 
itates formation of thick IF bundles. PKP3 has also been shown to associate 
with K18 (Bonne et al., 2003). The IF binding site on PKP maps to the 
N-terminal head domain, raising the question of how this domain could 
associate with both the membrane complex and IFs at the same time. 
Indeed, the physiological relevance of these interactions has been called 
into question based on the fact that PKP isn’t normally seen to associate 
with IFs in cells, and its position in the desmosome plaque suggests it 
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may be inaccessible to IFs (North et al., 1999). However, it has been 
suggested that in DP-deficient or mutant cells, PKP may participate in 
tethering keratin filaments to the plaque (Norgett et al., 2000; Vasioukhin 
et al., 2001). 


2. Pinin 

Pinin was identified as a 140 kDa protein widely expressed in a number 
of epithelial tissues, brain, and heart, which incorporates into mature 
desmosomes near where keratin filaments impinge upon the desmosomal 
plaque (Ouyang and Sugrue, 1996). Pinin contains three coiled-coil do- 
mains, a small stretch of glycine loops, a short glutamine-proline-rich 
domain and a polyserine domain (Ouyang et al., 1997). Yeast two hybrid 
and in vitro overlay analysis have demonstrated that the N-terminal domain 
of pinin interacts with the 2B rod domain of keratins 18, 8, and 19 (Shi 
and Sugrue, 2000). Interestingly, another 140 kDa protein, identified 
years ago as a lamin B-like protein present in small amounts in prepara- 
tions of bovine muzzle desmosomes, was shown to associate with vimentin 
in an in vitro overlay assay (Cartaud et al., 1990). Any relationship to pinin 
has not been proven, however. 


3. Desmocalmin/Keratocalmin 


Desmocalmin, which may be the bovine equivalent to a human pro- 
tein called keratocalmin (Fairley et al, 1991), was identified in the 
mid-1980s as a desmosome protein that binds both to calmodulin in a 
Ca”'-dependent manner and to reassembled keratin filaments in vitro in 
the presence of Mei" (Tsukita and Tsukita, 1985). Unfortunately, due to 
the loss of critical reagents, progress on this potentially interesting regula- 
tor of IF binding has been impeded, and further work will be necessary to 
revisit its potential role in desmosome assembly and regulation. 


B. Polycystin-1 

Polycystin-1 is a widely expressed, large, multispan membrane protein 
that is a target for mutation in most patients with autosomal dominant 
polycystic kidney disease. Mutations in this protein interfere with the 
normal differentiation of renal tubular epithelial cells, resulting in the 
formation of large cystic kidneys. Although the mechanisms underlying 
these mutant phenotypes are poorly understood, the polycystin C-terminal 
domain has been implicated in modulating wnt signaling by stabilizing 
the armadillo protein (@-catenin (Kim et al., 1999b). It may also regu- 
late G-protein signaling by similarly stabilizing a regulator of G-protein 
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signaling (RGS), RGS7, that is normally rapidly degraded by the 
proteosome pathway (Kim et al., 1999a). 

To gain further insight into the functions of polycystin-1, the C-terminal 
tail (PICT) was used to screen a kidney yeast two hybrid library. In this 
study, vimentin, K8/K18, and desmin were identified as binding partners, 
and these interactions were confirmed by GST pull down and in vitro 
polymerization assays (Xu et al., 2001). The polycystin-IF interactions 
were shown to be mediated by coiled-coil regions in both interacting 
partners. In addition to associating with the IF cytoskeleton, this versatile 
protein has been shown to specifically colocalize with IF-associated des- 
mosomes (Scheffers et al., 2000; Xu et al., 2001) and to form a complex 
with E-cadherin/-catenin in kidney cells. This latter interaction is im- 
paired in cells from patients with polycystic kidney disease, apparently 
due to increased phosphorylation of polycystin-1 (Roitbak et al., 2004). 
Polycystin-1 is also localized to focal contacts where it closely associates 
with a261 integrin (Wilson et al., 1999). It is still unclear whether poly- 
cystin utilizes cytoskeletal-cell junction scaffolding, properly localizing 
itself to perform signaling functions that govern kidney cell differentiation 
and polarization, or whether these associations are more directly involved 
in disease etiology (or both). 


C. Association of IFs with the Actin-Rich Cortex 


The cortical region of many cells is enriched in actin and associated 
actin-binding proteins, which function in motility, cell shape mainte- 
nance, and membrane protein distribution in polarized cells. In some 
cases, discrete structures anchor actin to the membrane, as is the case for 
intercellular adherens junctions and cell-substrate focal contacts. In cer- 
tain special cell types, the fundamental blueprint for an adherens junction 
is taken to a new structural level, serving as scaffolding for cell-type specific 
complexes, such as the dystrophin-associated protein complex (DPC) in 
striated muscle. Although for years morphological studies have described 
a close association with IF with the actin-rich cortex, recent advances in 
methods to study protein-protein interactions have provided new insight 
into the intimate structural and functional relationship between IFs and 
these membrane domains. 


1. Focal Contacts 


Fibroblasts and endothelial cells do not assemble hemidesmosomes. 
Rather, their matrix adhesive devices are so-called focal contacts. Pivotal 
components of focal contacts are heterodimeric integrins that bind to a 
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variety of matrix molecules including fibronectin, vitronectin, and lami- 
nins (Brakebusch and Fassler, 2003; Hynes, 2002; Zamir and Geiger, 
2001). Classically, integrins associate with cytoplasmic actin microfila- 
ments through a number of scaffold and linker proteins. However, several 
reports indicate that focal contacts associate also with the vimentin 
IF network (Bershadsky et al., 1987; Gonzales et al., 2001). Although not 
definitively identified as a linker for vimentin at focal contacts, plectin is a 
likely candidate, particularly the murine plectin isoform If (Rezniczek 
et al., 2003). Another potential connector for vimentin in integrin-based 
contacts is MICAL, a HEF1/CasL interacting protein (Suzuki et al., 2002). 
HEF1/CasL is a member of the Cas family, whose members are docking 
proteins involved in integrin signaling with a broad range of cellular 
processes (O’Neill et al., 2000). Its expression is highest in T cells, B cells, 
and epithelia (O’Neill et al., 2000), but it is also present in fibroblasts and 
focal contacts of adherent cells (Law et al., 2000). 

Are vimentin-associated matrix adhesive devices in nonepithelial 
cells hemidesmosome equivalents? Several lines of evidence support 
this possibility. Although vimentin knockout mice show no obvious defects 
in development, cells derived from those mice display several defects. 
Wound healing defects in vimentin knockout mice are thought to be 
caused by failure of mesenchymal collagen contraction (Eckes et al., 
2000). Vimentin-deficient fibroblasts show impaired mechanical sta- 
bility, motility, and directional migration, leading to the proposal that 
the lack of vimentin induces aberrant organization of focal contacts in 
the null cells (Eckes et al, 1998) Experimental support for this pro- 
posal comes from studies in cultured endothelial cells in which vimentin 
knockdown resulted in reduced focal contact size, even though the micro- 
filament and microtubule cytoskeleton systems show no obvious irregu- 
larities (Isuruta and Jones, 2003). Furthermore, when exposed to fluid 
shear stress, the vimentin-deficient cells showed impaired adhesion to 
the substrate. Intriguingly, hemidesmosomes lacking a connection to 
keratin are also reduced in size (Gache et al., 1996; Guo et al., 1995). 
Collectively, these observations suggest that vimentin IFs may regulate 
cell adhesion in a manner parallel to that of keratins, which 
stabilize hemidesmosome adhesion to the basement membrane (Jones 


et al., 1998). 
2. Muscle Cell Surface Attachments for IFs 


In striated skeletal and cardiac muscle, the desmin IF network forms 
an exosarcomeric lattice surrounding the myofibrils. Parallel filaments 
link Z-disks within a myofibril, and a prominent series of transverse fila- 
ments link neighboring myofibrils at the level of the Z-disks (Price and 
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Sanger, 1979). In addition, transverse IFs extend out to the plasma mem- 
brane or sarcolemma, where they interact with cortical structures called 
costameres that act as sites of force transmission (Danowski et al., 1992; 
Pardo et al., 1983). Costameres are cortical densities comprising numerous 
proteins that circumferentially line up in register with the Zdisks of 
peripheral myofibrils and link the force-generating sarcomeres with the 
plasma membrane (Ervasti, 2003) (Fig. 4). The desmin network is critical 
for maintaining muscle structure and function in response to stress, as loss 
or mutation leads to cardiomyopathy and muscle dystrophy in highly used 
skeletal muscles (Carlsson and Thornell, 2001). 


Sarcoglycan 
complex \ 
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Desmin IF —> 


Costamere 
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Fic. 4. Structure of striated muscle costameres and the DPC. A single membrane- 
associated costamere from a portion of a striated muscle fiber is magnified above to 
show the components of the dystrophin-associated protein complex that are involved in 
linking desmin intermediate filaments (IFs) to the muscle cell membrane. Additional 
actin-associated proteins present at these sites (including vinculin, talin, spectrin, and 
ankyrin) are not shown here. In addition to components of the DPC, plectin 
has also been localized to costameres, and likely contributes to linking desmin IFs to 
actin-associated structures. 
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Costameres have been compared to focal contacts or adherens-type 
junctions that, in fibroblasts, link actin to the membrane. Like focal 
contacts, costameres contain actin-associated proteins such as vinculin, 
talin, spectrin, and ankyrin (Craig and Pardo, 1983; Small et al., 1992). 
In addition, they act as major integrators of the striated muscle IF cyto- 
skeleton. In smooth muscle, dense plaque material forms riblike arrays 
over the cell surface (Small, 1995) that, like costameres, provide anchor- 
age sites for actin and desmin IFs (Small and Gimona, 1998). One of the 
key assemblies localized to the costamere is the dystrophin-associated 
protein complex (DPC), a collection of structural and signaling pro- 
teins critical for mechanical integrity of muscle. This complex is a fre- 
quent target of human mutation, resulting in a variety of muscular 
dystrophic diseases (Blake and Martin-Rendon, 2002). A key coordinat- 
ing component in the IF-specific portion of the costamere and the DPC is 
a-dystrobrevin. 


3. a-Dystrobrevin and the IF-Associated DPC 


Dystrobrevins are a small family of dystrophin-related proteins encoded 
by two genes, one of which—a-dystrobrevin—encodes at least three pro- 
teins all expressed in cardiac and skeletal muscle. a-Dystrobrevin is a key 
component of the DPC, and its loss results in neuromuscular junction 
defects and muscular dystrophy. It is a cytoplasmic protein indirectly 
linked with the transmembrane sarcoglycan components via dystrophin 
and other components of the DPC (Fig. 4) (Blake and Martin-Rendon, 
2002). Recently, two novel type IV IF proteins—syncoilin (Newey et al., 
2001) and desmuslin (Mizuno et al., 2001)—were both discovered in a 
yeast two hybrid screen for a-dystrobrevin binding partners and shown to 
associate with this key component of the DPC (Blake and Martin-Rendon, 
2002). 

Syncoilin is highly expressed in skeletal and cardiac muscle and 
is localized to the neuromuscular junction, sarcolemma, and Z-lines. 
Likewise, desmuslin is expressed in heart and skeletal muscle and localized 
at Z-lines. It was shown that syncoilin and desmin interact directly, but do 
not coassemble into filaments; in fact, evidence suggests that syncoilin 
does not participate in filament formation at all. It was proposed that 
syncoilin helps anchor the desmin IF network at the sarcolemma and the 
neuromuscular junction (Poon et al., 2002). More recent work has ana- 
lyzed patients with a desmin-related cardiomyopathy in which patients with 
desmin accumulation also exhibit an upregulation of syncoilin and accu- 
mulation of other elements of the DPC. These defects were correlated with 
a disappearance of both a-dystrobrevin-1 and neuronal nitric oxide 
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synthase (nNOS) from the membrane, possibly contributing to ischaemic 
injury. The authors suggest that the a-dystrobrevin-syncoilin-desmin inter- 
action may be a key contributing factor to loss of a-dystrobrevin from the 
membrane, and subsequent loss of nNOS (Howman et al., 2003). 

The desmuslin sequence is similar to that of synemin (Hesse et al., 
2001), a desmin-binding protein originally identified as an IFAP (Granger 
and Lazarides, 1980), but which now—based on sequence homology 
comparisons—is considered to be a bona fide IF (Bellin et al., 1999). The 
similarity between desmuslin and synemin raises the possibility that des- 
muslin is the human orthologue of synemin (Hesse et al., 2001). Two 
synemin isoforms resulting from the alternative splicing of a single gene 
have been termed a- and (-synemin, and a third isoform was recently 
identified in humans (Xue et al., 2004). G-Synemin appears to be the 
predominant form in striated muscle (Titeux ei al., 2001), where it colo- 
calizes with costameres and muscle Z-lines, and is enriched at the neuro- 
muscular and myotendinous junctions (Mizuno et al., 2004). a- and 
B-synemin levels are comparable in smooth muscle. Synemin has also been 
shown to interact with myofibrillar Z-line a-actinin and the costameric 
protein vinculin (Bellin et al., 1999). Thus, desmuslin/synemin may help 
to link the muscle cell IF desmin to myofibrillar Z-lines and costameres at 
the muscle plasma membrane (Bellin et al., 1999, 2001). 

Naturally occurring human mutations in the DPC are beginning to 
reveal discrete functions for the component parts of this IF-associated 
protein meshwork. For instance, whereas disruption of dystrophin leads 
to muscular dystrophy accompanied by sarcolemmal damage, ablation of 
a-dystrobrevin results in muscular dystrophy accompanied by minimal 
membrane damage (Grady et al, 1999). Although the underlying 
basis of these differences is not fully understood, it is proposed that the 
a-dystrobrevin-IF network is involved in lateral force transmission through 
the costamere, while other components are sufficient to maintain 
membrane integrity (Ervasti, 2003). 

Although much of the focus has been on the DPC of striated muscle, it 
is likely that desmin attachments to dense plaques of smooth muscle play 
critical roles in regulating the transmission of contractile forces in this 
tissue as well. This is particularly relevant in light of the observed defects in 
smooth muscle of desmin-deficient mice, in which active force per cross- 
sectional area was reduced to 40% of controls of smooth muscle tissue 
(Sjuve et al., 1998). IFAP candidates for serving this linking function are 
plectin and other components of the actin-rich cortex, including calponin 
(which also plays a role in the cytoplasm of smooth muscle cell dense 
bodies; see below), and the spectrin/ankyrin complex. 
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4. Spectrin and Ankyrin 


In addition to ABD-containing proteins such as plectin, ankyrin and 
spectrin/fodrin (nonerythrocyte spectrin) may contribute to linking IFs to 
the actin-rich cortex in both muscle and nonmuscle cells. A number of 
early studies pointed to these proteins as potential IF linkers; however, a 
caveat of these studies is that many were performed prior to widespread 
use of recombinant proteins and involved the use of purified or enriched 
preparations that could have been contaminated with other intermediary 
molecules. The nonerythrocyte form of spectrin was implicated early on to 
play such a role when Mangeat and Burridge (1984) demonstrated that 
microinjection of an antibody directed against spectrin resulted in an 
altered distribution of vimentin-containing IF in living fibroblasts. Howev- 
er, it remained possible that these reported IF-spectrin interactions could 
be indirect, as Wiche and colleagues demonstrated that the IFAP plectin 
interacts with spectrin and the 240 kDa chain of fodrin (nonerythrocyte 
spectrin), and thus could serve as the link between IF networks and 
spectrin (Herrmann and Wiche, 1987). A more direct role for spectrin 
binding to desmin IF was implicated from in vitro experiments by Langley 
and Cohen (1986). These investigators reported that desmin filaments 
and erythrocyte spectrin cosediment. In addition, electron microscopy was 
used to demonstrate the interaction of desmin IF via multiple lateral 
associations with IOV (inside-out membrane vesicles from erythrocytes). 
However, the possibility of a contaminant such as ankyrin (see below) that 
might mediate an interaction between spectrin and desmin IF was not 
ruled out. The same investigators also published in vitro data to suggest 
that spectrins from erythrocytes versus spectrins from the brain bind 
with differing affinities to vimentin IFs and neurofilaments (Langley and 
Cohen, 1987). Later experiments went on to demonstrate an interaction 
between the /-subunit of brain spectrin and the NF-L rod; these investi- 
gators made efforts to verify that their spectrin preparations were not 
contaminated with other minor components, such as ankyrin (Frappier 
et al, 1991, 1992). 

In apparent contrast to the above results, Georgatos and Marchesi 
(1985) demonstrated that unpolymerized lens vimentin was capable of 
binding in a saturable manner to human erythrocyte IOVs stripped of 
actin and spectrin. Removal of ankyrin, a protein that links the actin- 
spectrin complex to the erythrocyte membrane, significantly lowered 
binding and a polyclonal antiankyrin antibody blocked 90% of the bind- 
ing. These investigators went on to demonstrate that vimentin IF sub- 
units associate with human erythrocyte plasma membranes (Georgatos 
et al., 1985), and desmin and vimentin IF subunits interact with avian 
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erythrocyte plasma membranes (Georgatos and Blobel, 1987b; Georgatos 
et ol, 1987) via their N-terminal head domains, an interaction that may 
also involve synemin (Granger and Lazarides, 1982). Affinity chromatog- 
raphy was used to demonstrate an association between vimentin and 
nuclear lamins, and interestingly, the C-terminal tail of vimentin was 
shown to be critical for this association (Georgatos and Blobel, 1987a). 

More recent work suggested that one of four 4.1 proteins—4.1R—may 
associate with neurofilament proteins in forebrain postsynaptic densities, 
thus regulating the associated spectrin-rich cortex (Scott et al., 2001). 
Blot overlay analyses demonstrated that, in addition to spectrin and actin, 
postsynaptic density polypeptides included NF-L and a-internexin as 
interacting partners for 4.1R. Collectively, these studies emphasize that 
common themes are used in different cell types to both strengthen 
plasma membrane domains enriched in actin and IF polypeptides and 
to coordinate these sites with cytoplasmic architecture. 


IV. OTHER CROSSLINKERS NOT IN THE PLAKIN FAMILY 


A. Keratinocyte Crosslinkers and Bundlers 


1. Filaggrin 


In contrast to the emerging family of cytolinker proteins that link IFs to 
other cytoskeletal elements and structures, proteins that bundle IFs are 
less diverse and seem to be limited to keratin-associated proteins, such as 
filaggrin and trichohyalin (reviewed in Coulombe et al., 2000). Filaggrin 
was identified in the early 1980s as highly basic, histidine-rich protein 
abundant in the upper differentiated layers of the skin (reviewed in Dale 
et al., 1993; Presland and Dale, 2000; Steinert et al., 1981). It is synthesized 
in the granular layer as a larger, highly phosphorylated precursor that is 
stored in keratohyalin granules in the suprabasal, granular layer of kera- 
tinizing epithelia (including epidermis, gingiva, and tongue). This precur- 
sor form, called profillaggrin, consists of tandemly arrayed (in humans, 
10-12; Gan et al., 1990) repeating units of the filaggrin monomer, sur- 
rounded by partial repeats and nonrepeat sequences at either end. The 
N-terminus consists of an A-domain comprising two functional calcium- 
binding EF hands and a cationic B-domain (Markova et al., 1993; Presland 
et al., 1992, 1995). It has been proposed that this domain may have 
multiple functions in sequestering calcium (possibly facilitating the calci- 
um-dependent processing of profilaggrin to filaggrin) and in regulating 
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the solubility of the protein and consequently keratohyalin granule 
formation. 

During terminal differentiation of keratinocytes of skin into stratum 
corneum, profilaggrin is dephosphorylated and proteolytically processed 
into single filaggrin proteins of 37 kDa (in humans) by means of several 
proteases (Pearton et al., 2001; Resing et al., 1995; Yamazaki et al., 1997). 
Filaggrin then crosslinks keratin filaments into macrofibrils orientated 
parallel to the surface of the skin (Dale et al., 1978; Manabe et al., 1991; 
Steinert et al., 1981). Filaggrin binds to the a-helical rod domain of keratin 
(Dale and Presland, 1999). This interaction is believed to depend solely on 
ionic interactions between the conserved positive and negative charges 
on the Oms of filaggrins on one hand and along the rod domains of 
IFs on the other (Mack et al., 1993). Filaggrin association with keratin may 
facilitate disulfide bond formation between keratin polypeptide chains 
during the remodeling that occurs in upper layers, which in turn may 
allow their survival during terminal differentiation (Dale et al, 1994; 
Presland and Dale, 2000). The in vitro finding of ordered assembly of 
keratin macrofibrils by filaggrin is supported by the effects of filaggrin 
expression in cultured cells, where overexpression of the monomer leads 
to the collapse of keratin and vimentin filament networks (Dale et al., 
1997; Kuechle et al., 1999) and impairs cell-cell adhesion (Presland 
et al., 2001). In contrast, profilaggrin is not able to aggregate keratin 
filaments (Lonsdale-Eccles et al., 1982). 

Filaggrin is further processed into free amino acids as terminal differ- 
entiation proceeds (Scott and Harding, 1986), leaving only a small amount 
as a component of the cornified envelope (Steinert and Marekov, 1995; 
Simon et al., 1996). Of the free amino acids, arginine is deiminated 
enzymatically (by peptidyl arginine deiminase; Tarcsa et al., 1996), whereas 
glutamine is modified to pyrrolidone carboxylic acid (Thulin and Walsh, 
1995), a highly hydroscopic compound. Although somewhat controversial, 
the free and modified amino acids have been proposed as an important 
factor in maintaining stratum corneum moisturization (reviewed in 
Rawlings and Harding, 2004). 


2. Trichohyalin 


Trichohyalin is a 248-kDa highly charged protein that is predomi- 
nantly expressed in specialized epithelia, such as the inner root sheeth 
(IRS) of the hair follicle, which are tailored for resisting mechanical stress. 
Trichohyalin is also present at lower levels in the filiform papillae of the 
tongue and some cells of the granular layer of epidermis (Lee et al., 1993; 
O’Keefe et al., 1993; Presland and Dale, 2000; Rothnagel and Rogers, 
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1986). Trichohyalin contains extensive repeat sequence blocks, which are 
predicted to form an elongated, flexible single-stranded a-helical rod (Lee 
et al., 1993). These predictions are consistent with biochemical and elec- 
tron microscopical analysis showing an extended shape with overall di- 
mensions of 85 nm. Similar to profilaggrin, the N-terminus comprises an 
S-100 protein-like calcium-binding domain. 

Initially, trichohyalin accumulates in cytoplasmic granules in IRS cells, 
and can colocalize with filaggrin in granules in some cases (Ishida- 
Yamamoto et al., 1997; Manabe and O’Guin, 1994; O’Keefe et al., 1993). 
As differentiation proceeds, trichohyalin associates with IFs (O’Guin et al., 
1992; Rothnagel and Rogers, 1986), a redistribution that may depend on 
posttranslational modification of arginyl residues to citrulline by peptidyl- 
arginine deiminase, which renders the protein more soluble and rigid in 
vitro (Tarcsa et al., 1996, 1997). 

Trichohyalin is thought to contribute to the scaffolding on which the 
cell envelope is organized and to anchoring keratin IFs to the cell enve- 
lope in the IRS. Extensive crosslinking occurs via the action of transglu- 
taminases with the formation of N*-(y-glutamyl)lysine isopeptide 
crosslinks (Harding and Rogers, 1971, 1972) between trichohyalin and 
the proteins of the cornified envelope in the IRS—including epiplakin 
and small proline-rich proteins (Steinert et al., 2003) —and keratin (Dale 
et al., 1978; Steinert et al., 2003). It has been proposed that the latter 
connection is facilitated by ionic interactions between trichohyalin and 
keratin IFs, as the covalent bonds form specifically at two domains (6 and 
8) of trichohyalin and the head and tail domains of keratin (Steinert et al., 
2003). Thus, in IRS cells, trichohyalin serves not only as a crosslinking 
component of the cornified envelope and keratin IFs, but also as a linker 
between the two structures (Steinert et al., 2003). In contrast, in the 
medulla of the hair, keratin filaments are virtually absent (Powell and 
Rogers, 1997). There, synthesis of trichohyalin granules continues until 
they undergo citrulline conversion and transglutaminase crosslinking, 
finally fusing to form a solid mass (Harding and Rogers, 1976; Powell 
and Rogers, 1997). 


3. Keratin Associated Proteins 


Keratin associated proteins (KAPs) represent a large group of proteins 
that are expressed upon terminal differentiation of hair cells (Rogers and 
Powell, 1993). KAPs are grouped in three clusters consisting of over 80 
proteins in 23 families (Rogers et al., 2002, 2004). KAP molecular weights 
are predominantly in the 15 to 25 kDa range, but can be as low as 5 kDa 
and as high as 53 kDa. The families are categorized into high and ultra- 
high sulphur families with cysteine contents of less than 30 mol% and up 
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to 36.8 mol%, respectively, and high glycine-tyrosine families with glycine- 
tyrosine residues making up to 67.8 mol% (KAP20.1) of the amino acid 
composition (Rogers et al., 2002). The amino acid sequences of the 
members of most protein families consist of unique repetitive stretches 
with repeat unit lengths ranging from 6 to 48 amino acids (Rogers 
and Powell, 1993). The onset of KAP expression generally follows 
that of keratin IF and is temporarily and spatially regulated in a protein 
family-specific manner. 

In macrofibrils of the hair, KAPs comprise the amorphous matrix that is 
believed to crosslink keratin bundles in the mature hair (Powell and 
Rogers, 1997). How KAPs contribute to this crosslinking phenomenon 
remains controversial. Disulphide bonds between keratin tails and matrix 
proteins, as well as within and in between different matrix proteins, may 
play a role. Alternatively, apolar interactions between glycine loops in 
glycine-tyrosine-rich KAPs type II keratins may take place (Powell and 
Rogers, 1997). 

Intriguingly, KAP-related genes pmg-l and pmg-2 have been identified 
in other epidermal appendages including mammary, sebaceous, and sweat 
glands, in addition to growing hair follicles in skin (Kuhn et al., 1999). 
Thus, KAPs may play a larger role in the differentiation of multiple 
epithelial appendage cell types. 


B. Proteins that Coordinate with Actin 


Several proteins that were originally identified as actin-binding proteins 
have recently been uncovered as having dual functions in regulating the 
organization or assembly state of type III IFs. A common theme in these 
studies is the role of calponin homology (CH) domains—already discussed 
above as having a prominent role in plakin family structure—in binding 
to IFs. These studies raise the possibility that CH domains enhance the 
cytolinker status of plakins by combining both actin- and IF-binding 
properties within a single domain. The following section discusses other 
actin-binding proteins that exhibit IF-binding properties. 


1. Calponin 


Calponin is a widely distributed actin-binding protein thought to play a 
role in smooth-muscle contraction by inhibiting actomyosin ATPase activ- 
ity (Small and Gimona, 1998). Its association with dense bodies of smooth 
muscle cells and widespread presence in other tissues led investigators to 
speculate that it may play additional roles in cytoskeletal organization 
(Nishida et al., 1990; Wang and Gusev, 1996). Dense bodies in smooth 
muscle cells are primary attachment sites for desmin IF (Cooke, 1976). 
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Consistent with a more general importance in organizing cytoskeletal 
networks, calponin colocalizes (Mabuchi et al., 1996; North et al., 1994) 
and copurifies with desmin from dense bodies of smooth muscle cells 
(Mabuchi et al., 1997). It is also present in the membrane skeleton of 
smooth muscle, as discussed above. Calponin coprecipitates with (Wang 
and Gusev, 1996), and possibly bundles, desmin (Nakagawa et al., 1993), 
and is capable of incorporating into desmin filaments in vitro (Mabuchi 
et al., 1997). However, the authors observed that at low ionic strength, 
calponin binds with higher affinity to tetrameric desmin than filaments. 
The likely IF binding site is located in the N-terminal 22 kDa fragment 
(Wang and Gusev, 1996) containing a CH domain. Collectively, these data 
raise the possibility that calponin may function to bridge IFs with actin in 
dense bodies. 


2. Fimbrin 


Fimbrin or plastin is an actin-bundling protein (Bretscher, 1981; Glen- 
ney et al., 1981; Namba et al., 1992) consisting of an N-terminal calcium- 
binding domain followed by four calponin-homology domains. Of the 
three isoforms, T-fimbrin/plastin has a broad tissue distribution, whereas 
the I- and L-isoforms are restricted to the intestine, and hematopoietic 
cells and certain carcinomas, respectively (Lin et al., 1993, 1994). L-fim- 
brin/plastin was found to coimmunoprecipitate with vimentin from a 
detergent extract of adherent macrophages, and the N-terminus of vimen- 
tin bound to a 45-residue fragment of the CH1-domain in overlay assays 
(Correia et al., 1999). Fimbrin/plastin most likely binds to tetrameric 
rather than filamentous vimentin. The in vivo interaction, observed by 
colocalization in immunostained macrophages, occurred in podosomes 
and filopodia during early adhesion events; it was proposed that fimbrin 
may direct vimentin assembly at these adhesion sites. 


3. Nebulin 


Ultrastructural studies have long suggested that desmin IFs are linked in 
some way to the Z-line in striated muscle; however, the molecular basis of 
this association has not been well defined. A good candidate for 
performing this function is a protein called nebulin, a component of 
the thin filament in skeletal and cardiac striated muscle (McElhinny 
et al., 2003). Nebulin spans from the pointed end of the thin filament 
where it binds to tropomodulin (McElhinny et al, 2001), to the 
Z-disk where it binds to actin, tropomyosin, and troponins. Thus, it is 
speculated that nebulin functions both as a molecular ruler regulating 
thin filament length (Kruger et al., 1991) and as a scaffold for sarcomeric 
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thin filaments (McElhinny et al., 2003). However, recent data also suggest 
that nebulin may have additional functions, ranging from signal transduc- 
tion to myofibril force generation. 

Nebulin is divergently spliced to give rise to a number isoforms 
with molecular masses of 750-850 kDa, which may provide different tem- 
plates to tailor sarcomere structure for various physiological functions 
(Kazmierski et al., 2003). Nebulin’s N-terminus associates with the thin 
filament, and its C-terminus is anchored at the Z-disk (see Fig. 2 in 
McElhinny et al., 2003). It is the C-terminal region—specifically modules 
referred to as M163-M170—that localizes to the Z-disk and binds to the 1B 
helix of the desmin rod (Bang et al., 2002). As desmin laterally links Z-disks 
together, this interaction is speculated to integrate myofibrils with the 
sarcolemma and other components of muscle cells (McElhinny et al, 
2003) and to maintain Z-lines in register (Bang et al., 2002). The potential 
importance of this interaction is highlighted by the existence of patients 
with truncation mutations of the nebulin gene that cause (recessive) 
nemaline myopathy (Pelin et al., 1999). 


V. IFAPs As PARTICIPANTS IN A SIGNALING SCAFFOLD AND 
CELLULAR METABOLISM 


In addition to structural roles, it is becoming increasingly apparent that 
a major function of IFs is to regulate the activities of cell metabolism, by 
serving as scaffolding for the dynamic regulation of associated proteins 
such as enzymes, signaling adapter molecules, stress proteins, cell death 
receptors, and even the endocytic machinery. 


A. Stress Proteins/Chaperones 


The role of the IF, and particularly the keratin filament system, in 
resisting the forces of mechanical stress has been well established. Howev- 
er, IFs also play a role in countering metabolic stress. Perhaps the best 
example is the cytoprotective role played by the simple epithelial keratins, 
K8/18. However, vimentin, desmin, peripherin, GFAP, the lens proteins 
phakinin, and filensin and other keratins have also been shown to associ- 
ate with members of the small heat shock protein (HSP) family, including 
HSP27 and aB-crystallin (reviewed in Coulombe and Wong, 2004; 
Marceau et al., 2001; Nicholl and Quinlan, 1994). 

Subjecting cells to various forms of stress has been shown to dramati- 
cally reorganize IF networks, concomitant with the redistribution of nor- 
mally soluble aB-crystallins to these networks, resulting in resistance to 
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detergent extraction. Based on these and similar observations, it has been 
suggested that aB-crystallin may function as a molecular chaperone for IF 
proteins (Djabali et al, 1997, 1999). In addition, in vitro biochemical 
analyses have suggested that the association of IFs with chaperones 
may regulate the organization of filaments by protecting them from non- 
covalent interactions that could lead to IF aggregation (Perng et al., 
1999a). Such a function might explain why mutations in aB-crystallin 
could underlie the abnormal aggregation seen in certain myopathies 
and in lens cataract formation (Perng et al., 1999b; Vicart et al., 1998). 
Keratin 8/18 also associates with other stress-related chaperone-like mo- 
lecules, including HSP/c70 (Liao et al., 1995; Napolitano et al., 1985), the 
human 78-kDa glucose-regulated protein (grp78) (Liao et al., 1997), and 
more recently, Mrj, a DnaJ/HSP40 family protein (Izawa et al., 2000). As 
Mrj binds directly binds to K18 and HSP/c’70, it was proposed that it may 
act as an adapter to link K8/18 to the HSP/c70 chaperone complex. The 
fact that microinjection of anti-Mrj antibody disrupted K8/18 networks 
further supports the contention that these chaperone complexes play 
an important role in regulating keratin organization. The importance 
of IFs in protecting the liver from metabolic stress is reflected by the 
fact that mutations in K8/18 predispose humans to a variety of chronic 
liver diseases (Ku et al., 1996, 2001, 2003a; Owens et al., 2004; Zatloukal 
et al., 2000). 


B. Proteins that Regulate Apoptosis 


Among the more recently recognized IFAPs are enzymes and receptors 
involved in the apoptotic pathway (reviewed in Coulombe and Wong, 
2004; Oshima, 2002). Nuclear lamins (Lazebnik et al., 1995), type I kera- 
tins (Caulin et al., 1997), vimentin (Byun et al., 2001), and desmin (Chen 
et al., 2003) are all caspase substrates, and evidence suggests that caspase- 
generated IF fragments can promote the apoptotic cascade (Byun et al., 
2001). Furthermore, recent work has shown that adapter elements of the 
apoptotic machinery, ubiquitinated DEDD and DEDD2, form a scaffold- 
ing dependent on K8/18 and active caspase 3 (Dinsdale et al., 2004; Lee 
et ol, 2002). This observation raises the possibility that IFs might partici- 
pate in optimally positioning downstream effectors in the cell death 
pathway. 

In addition to these downstream enzymatic effectors, IFs interact with 
receptors in the death pathway. K8 and K18 interact directly with the 
cytoplasmic domain of tumor necrosis factor receptor 2 (TNFR2) and 
attenuate TNF-induced, Jun NH(2)-terminal kinase (JNK) signaling, 
NFx«B activation, and cell death (Caulin et al., 2000). Modulation of cell 
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surface targeting of Fas has also been observed in cultured cells lacking 
K8/K18, the loss of which sensitizes cells to Fas-mediated apoptosis 
(Gilbert et al., 2001), in part by regulating levels of c-Flip, a protein that 
governs the antiapoptotic ERK1/2 signaling pathway in these cells (Gilbert 
et al., 2004). The observation that c-Flip physically interacts with K8 and 
Raf-1 kinase—the latter of which is required for Fas activation of the 
ERK1/2 pathway—supports the idea that these proteins participate in 
an antiapoptotic complex. These findings are also in line with the ob- 
served sensitivity of transgenic mice with a keratin 18 mutation to Fas- 
mediated apoptosis (Ku et al., 2003b). A role for K8/18 in governing the 
targeting of cell surface proteins to their correct domains is supported by 
studies demonstrating altered distribution of cell surface proteins in 
intestine and liver of CK8-deficient animals (Ameen et al., 2001b). Recent- 
ly a direct interaction between both K14 and K18 and the TNFRI-asso- 
ciated death domain protein (TRADD) was observed. The authors showed 
evidence that keratins can compete with ligand-activated TNFRI for bind- 
ing with TRADD (Inada et al., 2001). These data provide an appealing 
explanation for how the presence of keratins could attenuate cell death, as 
keratin binding to TRADD would interfere with caspase 8 activation. 


C. Interaction of IF with Kinases and Cell Cycle Machinery 


Several lines of evidence suggest that IFs serve as scaffolding for intra- 
cellular kinases, which not only regulate the phosphorylation and assem- 
bly state of IFs, but also govern kinase activity and, in some cases, cell 
proliferation and differentiation. 

Cdk5 has long been known to be an important modulator of the 
neurofilament network, which is highly sensitive to regulation by phos- 
phorylation (Grant et al., 2001). As its name implies, cdk5 is a cyclin- 
dependent kinase; however, it has not been found to be involved in cell 
cycle regulation per se, but is instead primarily active in postmitotic cells 
such as neurons. Phosphorylation of NF proteins regulates their assembly 
state, association with other cytoskeletal elements, and transport within 
neurons. Yeast two hybrid analysis has suggested that the Cdk5 may 
associate with NF-H via its activator p35, which associates directly with 
the C-terminal region of the NF-M rod domain (Qi et al., 1998). More 
recently, another type IV IF protein, nestin, was identified as a novel in vivo 
target for cdk5/p35 kinase. The physical association between nestin and 
the kinase complex is regulated by the differentiation state of myoblasts. 
The authors suggest there exists a continuous turnover of cdk5 and p35 
activity on a nestin scaffold that affects the organization and function of 
both cdk5 and nestin during development (Sahlgren et al., 2003). 
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IFs of the nervous system are not the only ones that serve as binding sites 
for kinases, which in turn regulate their assembly state. K8/18 have 
recently been shown to associate with mitogen-activated protein kinase 
p38; K8 is specifically phosphorylated at a single site, ser-73, in a physio- 
logic manner, causing keratin IF reorganization (Ku et al., 2002a). Intro- 
duction of human disease causing mutation in close proximity to this 
residue increases phosphorylation, suggesting that keratin hyperpho- 
sphorylation may contribute to aberrant IF organization and disease 
pathogenesis. 

Association of differentiation-specific keratins with cell cycle machinery 
also influences proliferation in the epidermis. Paramio and colleagues first 
demonstrated that K10 inhibits, while K16 promotes, the proliferation of 
human keratinocytes by acting in some manner on the retinoblastoma 
(Rb) pathway (Paramio et al, 1999). They later demonstrated that 
this occurs through a physical interaction of K10 with Akt (protein kinase 
B [PKB]) and atypical PKC¢. Sequestration of these kinases inhibits their 
intracellular translocation and interferes with their activation, consequent- 
ly impairing pRb phosphorylation, reducing the expression of cyclins D1 
and E (Paramio et al., 2001) and reducing NFxB activity (Santos et al., 
2003). Consistent with the observed ability of K10 to interfere with these 
cell cycle pathways are in vivo studies showing that mice ectopically ex- 
pressing K10 in the proliferating basal layers of the epidermis exhibit a 
hypoplastic and hyperkeratotic epidermis. The association of IFs with 
adapter proteins such as 14-3-3 also has a potential impact on regulation 
of the cell cycle machinery, as detailed in the section that follows. 


D. 14-3-3 Proteins 


14-3-3 proteins are adapters that interact with specific phosphoserine or 
threonine motifs in a plethora of cell cycle proteins, transcription factors, 
and other binding partners involved in signal transduction cascades 
(Bridges and Moorhead, 2004). It is thought that binding of 14-3-3 pro- 
teins to their partners either changes protein conformation, physically 
masks certain features, or serves a scaffolding function (Bridges and 
Moorhead, 2004). The protein 14-3-3 has been shown to interact directly 
with both vimentin and keratin IF. Cell cycle dependent phosphorylation 
of K18 leads to its association with 14-3-3 (Ku et al., 1998), which in turn 
modulates keratin filament organization and acts as a solubility cofactor 
(Liao and Omary, 1996). But 14-3-3 also regulates mitotic progression of 
cells expressing K8/18, such as hepatocytes (Ku et al., 2002b). In mice 
lacking K8/18, cell cycle regulation is disturbed and 14-3-3 is redistributed 
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to nuclei. This change in localization could, in turn, alter the association 
of 14-3-3 with other important cell cycle regulators (Coulombe and Wong, 
2004; Toivola et al., 2001). Also, 14-3-3 interacts with phosphovimentin, 
displacing the 14-3-3 partner Raf, which can no longer be activated by EGF 
(Tzivion et al., 2000). In general, it seems that interaction of 14-3-3 with 
IFs interferes with other effector pathways. In an interesting twist, 
studies have shown that the 14-3-3 binding partner Raf can also bind 
directly to K8/18, the latter interaction which blocks keratin—14-3-3 bind- 
ing in a reconstitution assay (Ku et al., 2004). The authors’ results suggest 
that K8/18 can sequester a unique phosphospecific population of 
Raf, consequently regulating its signaling functions in a manner that is 
analogous to 14-3-3’s regulation of Raf-l. Whether IFs might also in 
some cases facilitate 14-3-3-dependent interactions with effectors awaits 
further study. 


E. Interaction with Lipids and Membranes: Involvement in 
Membrane Trafficking 


Reports dating back to the mid 1980s describe a potential direct associ- 
ation between vimentin and membrane lipids (Traub et al., 1985, 1987). 
Other observations suggested that vimentin may play a novel role 
in differentiating 3T3 cells during adipose conversion by participating in 
the compartmentalization of the forming lipid globules (Franke et al., 
1987). More recent work utilizing vimentin-deficient cells has demon- 
strated that the intracellular movement of LDL-derived cholesterol from 
the lysosome to the site of esterification is impaired (Sarria et al., 1992) 
and the cells show defects in glycolipid synthesis, possibly due to a defect 
in transport of glycosphigolipids between the endosomal/lysosomal path- 
way and the Golgi apparatus. A direct interaction between vimentin and 
the Golgi complex through the peripherally associated Golgi protein 
formiminotransferase cyclodeaminase (FTCD) has been demonstrated. 
Neither the interaction, nor the ability of FTCD to promote filament 
assembly, require enzyme activity (Gao et al., 2002). 

The observation that cell surface marker distribution and transport of 
components in the apoptotic cascade are altered in K8/18 deficient cells 
(Ameen et al., 2001a; Gilbert et al., 2001) raises the possibility that IFs play 
a more general role in the organization of intracellular membranes and 
trafficking. Supporting this idea, regional specific differences in the distri- 
bution of syntaxin-3, a central player in the SNARE machinery involved 
in vesicular transport, were observed in the intestines of K8 null cells. 
Furthermore, the SNARE SNAP23 associates with vimentin filaments in 
primary fibroblasts and HeLa cells (Faigle et al., 2000). 
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Very recently, a novel association between rat brain peripherin and the 
AP-3, an adapter protein involved in the formation of lysosomes, lysosome- 
related organelles, and synaptic vesicles, was identified. The authors ex- 
tended their work to show that other type HI IFs, including a-internexin 
and vimentin, also associate with AP-3 (Styers et al., 2004). The authors went 
on to show that in vimentin-deficient fibroblasts, AP-3 and late endosomal/ 
lysosomal markers are redistributed and AP-3-dependent sorting is im- 
paired. Together, these data suggest that the IF cytoskeleton participates 
in regulating membrane architecture and compartmentalization, as well as 
trafficking events in the endolysosomal pathway. 


VI. MOTORS THAT Move IFs 


Some IFAPs are motor molecules that play a role in positioning IFs in 
various tissues and in regulating the dynamic nature of the IF cytoskeleton 
(reviewed in Chang and Goldman, 2004; Helfand et al., 2004). Far from 
being the static struts, IFs and their precursors have been shown to be 
dynamic elements of the cytoskeleton that are whisked around the cell by 
molecular motors. Some of the first data demonstrating that IFs exist as 
precursor particles that are moved around the cell came from studies in 
the Goldman lab. This group demonstrated that GFP-vimentin incorpo- 
rates into small vimentin particles in BHK fibroblasts that colocalize with 
the anterograde MT motor kinesin and move rapidly in a kinesin-depen- 
dent manner (Yoon et al., 1998). The kinesin-heavy and specific 62kDa 
light chain appear to be required for mediating interactions with vimentin 
(Avsiuk et al., 1995; Liao and Gundersen, 1998). The particles eventually 
coalesce into larger structures referred to as squiggles, which in turn are 
postulated to assemble into longer filaments. Although anterograde move- 
ments are more common, movements can occur in both directions. More 
recently, the same group demonstrated that the dynein/dynactin complex 
is also associated with vimentin and is responsible for retrograde move- 
ment of particles, squiggles, and filaments (Helfand et al., 2002). Thus, a 
balance of plus and minus end movements is thought to be important for 
the homeostasis of the intermediate filament network. 

Vimentin is not the only IF type that is moved in this manner. Neural 
IFs, including both the type III peripherin proteins and type IV neural IF 
proteins, have also been demonstrated to be moved by molecular motors. 
In the case of peripherin, particles and squiggles were observed to trans- 
locate rapidly within PC12 cell bodies, neurites, and growth cones. The 
movements were bidirectional and dependent on microtubules, kinesin, 
and cytoplasmic dynein. The authors suggest that peripherin particles are 
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part of a rapid transport system that delivers cytoskeletal subunits to distal 
regions of neurites (Helfand et al., 2003). 

Studies have demonstrated that GFP-tagged NF-M and NF-H form 
particles and move in an MT-dependent fashion, similar to that demon- 
strated for peripherin (reviewed in Prahlad et al., 2000; Shah et al., 2000; 
Shea, 2000; Yabe et al., 1999). These studies also help to explain the 
underlying basis of slow axonal transport. NF particles and NF up to about 
16 um actually move at high rates, but their transport is punctuated by 
long pauses. The result is net movement consistent with slow axonal 
transport (Wang et al., 2000). Intriguingly, peripherin pauses two to three 
times less frequently than NF, in spite of similar MT-dependent mechan- 
isms of transport. Differences in IF structure—such as the existence of 
the long, highly charged tails on NF-M and NF-H that might act as cross- 
bridges to stop motility—and regulation of molecular motor association 
by phosphorylation could also account for these differences (Yabe 
et al., 2000). 

Myosin Va is also associated with NFs, with NF-L being the major 
binding partner (Rao et al., 2002). This association has been shown to 
be important in regulating the localization and content of a large 
pool of myosin Va in neurons, in addition to regulating NF number. 
Association with the actomyosin system provides another potential trans- 
port mechanism and an opportunity for more sophisticated modes of 
regulation within the peripheral nervous system. 

Nudel, first identified as a protein required for nuclear migration in 
filamentous fungi, was identified as a critical NF associated protein in 
mammals. Its loss results in disruption of NF organization and neurode- 
generative changes (Holzbaur, 2004; Nguyen et al, 2004). The results 
suggest that Nudel is important for assembling NF cytoskeleton through 
its direct interaction with the central rod domains of NF-L. As nudel also 
interacts with dynein, it also seems possible that its role in assembly is 
coupled with MT-dependent transport of NF down the axon. Interfering 
with this process could potentially lead to NF accumulation and signs of 
neurodegeneration. 

Mutations in motor proteins or IFs themselves (which may alter their 
associations with IFAPs or motors) lead to accumulations of IFs in ALS, 
Charcot-Marie Tooth disease 2, and Parkinson’s (Goldstein and Yang, 
2000; Helfand et al., 2004). Impaired assembly and transport of NFs is a 
critical determinant of neurodegenerative disease. Consistent with a criti- 
cal role for kinesin in vivo, mice lacking the neuronal-specific conventional 
kinesin heavy chain KIF5A were shown to have accumulations of NF-H, as 
well as NF-M and NF-L, in the cell bodies of peripheral sensory neurons. 
The presence of these accumulations was accompanied by a reduction in 


180 GREEN ET AL. 


sensory axon caliber. These data support the hypothesis that a conven- 
tional kinesin plays a role in the microtubule-dependent slow axonal 
transport of at least one cargo, the NF proteins (Xia et al., 2003). 

Keratin IFs exhibit different motile behaviors compared with type III 
and IV IFs, and continue to move in the absence of MT in cells treated 
with MT inhibitors, such as nocodazole. It has been proposed that actin 
may be involved in their transport (Yoon et al., 2001). Time lapse imaging 
of keratin-GFP has revealed that keratin IF formation begins in the cell 
cortex in a region enriched in actin and required for both intact MT and 
actin, and that actin governs the organization and movement of keratin 
in Xenopus egg extracts (Weber and Bement, 2002). Less is known 
about specific mediators of these interactions, although the fact that 
myosin Va associates with NFs raises the interesting possibility that other 
such interactions may exist for keratins. 

Collectively, these observations suggest that multiple interactions be- 
tween motor proteins and various cell-type-specific IFs have evolved to 
correctly deploy IF precursors in the cytoplasm, where they can appropri- 
ately polymerize and perform their functions without interfering with 
other aspects of cellular metabolism. These precursors can take on a 
variety of forms, from particles to short filaments. It has been demon- 
strated that different-sized IFs move at different speeds, but it seems 
unlikely that these differences in rates of movement are directly related 
to size, as kinesin moves cargoes equally irrespective of size. It has been 
proposed that IFAPs, such as crosslinkers in the plakin family, may 
tether longer structures to cytoarchitecture, thus conferring another level 
of regulation to the process of IF transport and assembly (Chang and 
Goldman, 2004). 


VII. THE FINE LINE BETWEEN IFs AND IFAPs 


The classification of proteins as IFs has been debated over the years, 
with much discussion as to whether certain IFs are bona fide IFs or IFAPs; 
then, if they are IFs, how should they be classified? In the end, the 
classification has been based primarily on sequence homology and protein 
structure. Thus, synemin and paranemin, which were originally identified 
as IFAPs, have now been dubbed IFs (Bellin et al., 1999; Hemken et al., 
1997). In spite of sequence similarities, however, they differ from classical 
IFs in that these proteins require other IFs (such as desmin) for incorpora- 
tion into the polymer. In this respect, synemin and paranemin are similar 
to several other IFs including nestin, NF-H, tanabin in the nervous system 
(for review, see Coulombe et al., 2000) and syncoilin in muscle (Blake and 
Martin-Rendon, 2002). 
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Another feature of these proteins is the divergence of their nonhelical 
domains, which are sometimes quite extended and play important roles in 
organizing the filaments with which they copolymerize. For instance, the 
661-residue-long NF-H tail projects away from the IF surface, providing a 
crosslinking function important for interfilament spacing in neurons (Lee 
and Cleveland, 1996). The incorporation of substoichiometric amounts of 
nestin into type III and IV IFs may play a similar role in spacing these 
filaments within cells (Steinert et al., 1999). Likewise, the tail domain of 
synemin provides a binding site for both desmin and a-actinin, thus 
integrating IF and actin systems at costomeres and z-lines in striated 
muscle (Bellin et al., 2001). Together, these data suggest that IF networks 
employ multiple mechanisms to regulate their organization in different 
cells and tissues, sometimes incorporating spacer IFAPs into the core of 
the polymer, and other times, recruiting peripheral proteins from 
which the polymer may more readily dissociate during remodeling events. 


VIII. CONCLUSIONS AND FUTURE DIRECTIONS 


The number of IF genes has risen to more than 67, and IF gene 
expression is exquisitely regulated during development and differentia- 
tion. As the IF family has broadened, so too has our definition of what 
constitutes an IFAP. It is now clear that IFAPs include among their 
number more than cytolinkers and bundling proteins. Enzymes, molecu- 
lar motors, membrane receptors, adapters, and other molecules important 
for cell homeostasis and metabolism have all been identified as directly 
binding to IFs and modifying their functions. The varied and regulated 
associations between IFs and their partners dramatically increase the 
functional complexity of the cytoplasmic IF scaffold. 

There is much more to this functional complexity than we have been 
able to discuss in this review. For instance, we have not touched on the 
complex intranuclear network mediated by the nuclear lamins, which is 
critically important for human health as reflected in the numerous disease 
phenotypes that are emerging due to mutation in lamins and their asso- 
ciated protein partners. We have also not discussed the association of IFs 
with molecules involved in bacterial and viral pathogenesis, such as the 
Ebstein-Barr virus latent infection membrane protein (LMP) (Liebowitz 
et al., 1987) or viral kinases such as the herpes simplex virus US3 kinase 
(Murata et al., 2002) that may be important in remodeling IFs during 
infection. It is clear, however, from the many critical cellular functions we 
have discussed in this review, that the IF structural and signaling scaffold is 
involved in wide-ranging functions that are important for fundamental 
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cellular processes, and that the diversity of the IF family and their partner- 
ships allows for tailoring these functions to the many cell types and tissues 
found in higher organisms. 
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ABSTRACT 


Spectrin family proteins represent an important group of actin-bundling 
and membrane-anchoring proteins found in diverse structures from yeast 
to man. Arising from a common ancestral a-actinin gene through duplica- 
tions and rearrangements, the family has increased to include the spec- 
trins and dystrophin/utrophin. The spectrin family is characterized by the 
presence of spectrin repeats, actin binding domains, and EF hands. With 
increasing divergence, new domains and functions have been added such 
that spectrin and dystrophin also contain specialized protein-protein inter- 
action motifs and regions for interaction with membranes and phospholi- 
pids. The acquisition of new domains also increased the functional 
complexity of the family such that the proteins perform a range of tasks 
way beyond the simple bundling of actin filaments by a-actinin in S. pombe. 
We discuss the evolutionary, structural, functional, and regulatory roles of 
the spectrin family of proteins and describe some of the disease traits 
associated with loss of spectrin family protein function. 
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I. INTRODUCTION 


The eukaryotic cytoskeletal network is formed from a number of filamen- 
tous systems composed of polymers of actin, tubulin, or intermediate fila- 
ment proteins. The actin stress fibers, microtubules, and intermediate 
filaments generated are integrated in a highly organized manner that can 
be both dynamic as well as stable. The filamentous state and organization of 
these proteins provides the cell with an internal scaffold essential to many 
cellular processes, including mechanical strength, cellular morphology, 
adhesion, motility, intracellular trafficking, cell division, and networks for 
inter- and intracellular communication. The cytoskeletal network allows 
rapid remodeling in response to altered mechanical needs facilitated by 
the dynamic exchange of protein subunits within the system, and by the 
manner in which the network is linked through crosslinking proteins. 

Significant contributors to cell structure are those proteins that cross- 
link actin filaments or connect actin filaments to the cell membrane. 
Examples of such proteins can be found within the spectrin superfamily 
of cytoskeletal proteins. This discrete group is principally composed of 
the actin crosslinking protein a-actinin, and the membrane-associated 
actin-binding proteins spectrin and dystrophin. 

The spectrin family of proteins is highly modular and share common 
structural elements including a calponin homology (CH) domain contain- 
ing an actin-binding domain, spectrin repeats, EF hands, and various 
other signaling domains and motifs (Fig. 1). The spectrin family of 
proteins arose from work that originally focused on understanding the 
role that spectrin played biochemically in the organization and assembly 
of the cytoskeleton (reviewed in Gratzer, 1985). Spectrin possesses the 
ability to self assemble, but the molecular basis of this process could not 
be explained until more was known about the sequence and, ultimately, 
the structure of the protein. The work of Speicher and Marchesi (1984) 
provided the protein sequence of almost half of the a-spectrin chain. This 
work identified spectrin as being a highly modular protein composed of 
many repeating 106-amino-acid units. The helical nature of these units was 
predicted to form triple-helical coiled-coil bundles that were dubbed 
spectrin repeats. Continued investigation and DNA sequencing led to the 
determination of several a-spectrin sequences from erythrocyte, Drosophila, 
and brain (Dubreuil et al., 1989; Sahr et al., 1990; Wasenius et al., 1989). It 
was around this time that the DNA sequences of the related proteins a- 
actinin and dystrophin were completed (Baron et al., 1987; Koenig et al., 
1988). These proteins were found to contain repeating units similar to 
those found in spectrin (Davison and Critchley, 1988) and, hence, the 
spectrin family of proteins was born. 
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Dystrophin 427 kDa 


o-Spectrin 280 kDa Key 
Ce) Actin-binding domain QO Coiled-coils 

o-Actinin 103 kDa (in) EF-hand motif ZZ domain 
0065 Spectrin repeats SH3 domain 
L] Cysteine-rich region PH domain 


Fic. 1. Structure of spectrin superfamily proteins. Modular domains within each 
protein are clearly defined. Shaded spectrin repeats represent coiled coils involved in 
dimerization events; incomplete repeats represent proportionally the number of coiled- 
coil helices contributed by a- and -spectrin when generating a complete spectrin 
repeat during formation of the spectrin tetramer. The dashed lines indicate how two 
spectrin heterodimers interact to form a functional spectrin tetramer. Asterisks in the 
dystrophin spectrin repeats represent the position of the two greater repeats in 
dystrophin with respect to utrophin, which in all other respects has a similar overall 
structure. Numbers in the EF hand regions represent the number of EF hand motifs. 


The sequencing of a- and (-spectrin, a-actinin, and dystrophin has 
revealed similarities not only within the spectrin repeat, but also the other 
domains and motifs present within these proteins. Subsequent analyses 
have revealed an evolutionary pathway for the divergence of spectrin and 
dystrophin from a common a-actinin ancestor via a series of rearrange- 
ments, duplications, and evolution of repeats and other domains, as well 
as the acquisition of unique domains such as PH, WW, and SH3 (Fig. 2). 


II. EVOLUTION 


The availability of complete sequences for a-actinin, spectrin, and 
dystrophin has allowed the ancestry and evolution of the proteins to be 
traced. Multiple sequence alignments and phylogenetic trees have been 
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Fic. 2. Evolution of the spectrin superfamily. Rounded rectangles represent spectrin 
repeats. Shaded rectangles denote a-actinin-like repeats involved in dimerization, 
whereas unshaded rectangles represent repeats that were involved in duplication and/ 
or elongation events. The incomplete spectrin repeats involved in tetramer formation 
are proportionally represented depending on the number of repeat helices each 
protein contributes to the formation of a complete spectrin repeat. (Adapted from 
Dubreuil, 1991; Pascual et al., 1997.) A dystrophin/utrophin ancestor probably diverged 
from a-actinin at a relatively early stage and then underwent its own series of 
duplications and acquisitions of new motifs. 


combined with precise alignment of equivalent domains within each 
protein to provide details of the relationships that exist between each of 
the protein domains. Analysis of the amino acid sequences from a-actinin, 
spectrin, and dystrophin suggested that all three of the protein families 
have arisen from a common ancestral protein that was similar to a-actinin 
(Byers et al., 1992; Dubreuil, 1991) via a series of gene duplications and 
gene rearrangements (Baines, 2003; Pascual et al., 1997; Thomas et al., 1997; 
Viel, 1999). One of the diagnostic features of the spectrin superfamily of 
proteins is the presence of the 106-120 repetitive unit referred to as the 
spectrin repeat (Speicher and Marchesi, 1984). Sequence comparisons sug- 
gest that all three of these proteins share significant homology within their 
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N-terminal actin-binding domains and in the spectrin repeats that form 
the rod domains (Davison and Critchley, 1988). The spectrin repeats are 
found in distinct multiples in each protein, resulting in a characteristic 
actin crosslinking distance. a-Actinin contains four repeats, (-spectrin 
contains 17, a-spectrin contains 20, and dystrophin contains 24. The 
sequences of some spectrin repeats of a- and /-spectrin are similar in 
many ways to the four repeats present in a-actinin (Dubreuil, 1991). 
Within the cell, a-actinin and spectrin dimerize, although the spectrins 
interact further to generate a functional tetramer (Fig. 1). Most notable is 
that the ends of the native spectrin tetramer involved in the dimerization 
event show remarkable similarity to the rod domain repeats of a-actinin 
that also mediate dimer formation. 

Indeed, homologous regions of all of the a-actinin protein domains can 
be found within the sequences of a- and (-spectrin. For example, the 
amino and carboxy terminal regions of a-actinin resemble the N-terminus 
of -spectrin and the C-terminus of a-spectrin, respectively (Byers et al., 
1989; Dubreuil et al., 1989). Phylogenetic analysis shows a common ances- 
tor for the first repeat of a-actinin and the first repeat of (-spectrin. 
Similarly, each of the remaining repeats in a-actinin (2-4) correspond 
to repeats 1 and 2 of (-spectrin and repeats 19 and 20 of a-spectrin 
respectively (Fig. 2). This may have relevance for the function of these 
repeats in the dimerization of these proteins (Pascual et al., 1997). It is the 
similarity between these regions of a-actinin, the spectrins, and the sim- 
pler domain organization of a-actinin that have led to the hypothesis that 
these two protein families have evolved from an a-actinin-like precursor. 

Spectrin is a much more elongated protein compared to a-actinin due 
to the additional number of repeats. The additional repeats are more 
closely related to one another than repeats common to both a-actinin and 
spectrin. The spectrin repeat sequences are the most divergent in dystro- 
phin and its homologue utrophin (Winder et al., 1995a), most likely 
reflecting an earlier divergent event when compared to spectrin (Pascual 
et al., 1997). 

The additional molecular length of spectrin compared to the ancestral 
a-actinin is believed to have arisen through two major duplication events 
containing blocks of seven repeats (Pascual et al, 1997). The beginning of 
the process can be described as the elongation of a-actinin by insertion of 
a seven-repeat block between the second and third a-actinin repeats. That 
block of repeats was then duplicated and an ancestor of the tetrameriza- 
tion repeat was inserted between the blocks (Pascual et al., 1997) (Fig.2). 
Overall, the two-stage evolution of the superfamily is believed to have 
involved an initial dynamic phase involving intragenic duplications and 
concerted evolution, followed by a stable phase where repeat numbers 
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became constant and their sequences evolved independently (Thomas 
et al., 1997). The first phase of evolution resulted in the a-actinin, spectrin, 
and dystrophin lineages via a series of duplication events. This process 
gave rise to the necessary repeat lengths and numbers within the spectrin 
and dystrophin lineages. The a-actinin lineage continued to evolve. Phy- 
logenetic analysis has indicated that the different isoforms found today in 
modern vertebrates arose after the vertebrate urochordate split, with the 
muscle and nonmuscle isoforms evolving separately (Virel and Backman, 
2004). Thomas and colleagues (1997) hypothesized that, during a transi- 
tion period, the new genes began to acquire distinct crosslinking distances 
and that subsequent selection against longer or shorter proteins would 
result in a stabilization of protein length. The current length of spectrin 
repeats is evidently very stable, as there has been little change since the 
split of the arthropod vertebrate lineages. 


Ill. STRUCTURE 


The spectrin superfamily is an important group of cytoskeletal proteins 
involved in many functions requiring crosslinking, bundling, or binding to 
filamentous actin. Each of the proteins within this family differs greatly in 
its specific biological function. However, they all share a surprising level of 
structural homology. The members of this protein family are composed 
of anumber of conserved domains: spectrin repeats, CH domain contain- 
ing actin-binding domains, EF hands, calcium-binding motifs, and various 
signaling domains (Fig. 1). The actin-binding domain (ABD) is the most 
N-terminal domain and can be found in a-actinin, P-spectrin, dystrophin, 
and utrophin, but not a-spectrin. The presence of the ABD allows these 
proteins to interact with F-actin in a variety of different cellular situations. 

The ABDs of these proteins comprise a tandem pair of CH domains, 
although the manner in which this domain interacts with F-actin is still 
unclear even after extensive investigation and modeling. The spectrin 
repeats form the rod domains of these proteins. There are four repeats 
in a-actinin and between 17 and 24 in spectrin and dystrophin. In general, 
the spectrin repeats are responsible for the overall length of the protein 
and they are usually recognized as modules for the generation of elongat- 
ed molecules and the separation of the specific N- and C-terminal domains 
(Winder, 1997). Overall, the core structures of the spectrin repeats from 
each family member are very similar, although the size of the repeating 
regions differs slightly. In a-actinin, repeats are 122 residues in length, and 
106 residues in length for spectrin; in dystrophin and utrophin, the 
repeating units are 109 residues in length. Within the spectrin family 
proteins, each member contains a differing number of repeating 
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elements. The four repeats in a-actinin are separated from the ABD by a 
linker that allows a significant degree of flexibility between the rigid rod 
and the ABD. In the functional a-actinin dimer, it is the spectrin repeats 
that are responsible for the dimerization. Crystal structures of the dimeric 
a-actinin rod domain have found that the rod bends slightly along its 
length, but also the whole domain twists through approximately 90 de- 
grees. In conjunction with the rod domain and ABD, the flexible linker 
that separates them contributes to a-actinin’s ability to crosslink actin 
filaments that are oriented in either a parallel or antiparallel manner 
(Ylanne et al., 2001a). 

The rod domains of a- and -spectrin are composed of 20 and 17 
spectrin repeats, respectively. At the N-terminus of a-spectrin, the first 
repeating segment begins with the “third helix” of what will become a 
complete triple-helical coiled-coil structure, analogous to a spectrin repeat 
when it interacts with the two helices that are found at the C-terminus of 
G-spectrin (Fig. 1). This site allows the spectrin dimer to interact with 
another dimer to form a spectrin heterodimer or tetramer. This is the 
functional unit of spectrin within the erythrocyte. It has been shown that 
specific repeats in both a- and -spectrin are not just present as structural 
modules that contribute to the length of the rod domain. Repeat 10 of 
a-spectrin has been found to be slightly shorter than a typical spectrin 
repeat and shows substantial homology to the SH3 domain of the Src 
protein family (Wasenius et al., 1989), whereas repeat 15 of G-spectrin has 
been found to be responsible for interaction with ankyrin (Kennedy et al., 
1991). The two most C-terminal spectrin repeats of a-spectrin (21 and 22) 
and the first two of -spectrin (1 and 2) once again show differences in 
sequence and structure from a typical spectrin repeat. This particular 
section of a-spectrin shows homology with the C-terminus of a-actinin, 
whereas repeats 1 and 2 of -spectrin share homology with the N-terminus. 
It has been found that these repeats are responsible for the dimerization 
of a- and -spectrin in a manner analogous to the four spectrin repeats of 
the a-actinin rod domain (reviewed in Winkelmann and Forget, 1993). 
The rod domains of dystrophin and its close relative utrophin have not 
been found to mediate dimerization. It seems that the spectrin repeats of 
these two proteins function primarily to separate the N- and C-termini. 
However, it should be noted that the rod domains of dystrophin and 
utrophin are able to associate with filamentous actin, although the man- 
ner of interaction differs for each protein (Amann et al, 1999; Rybakova 
et al., 2002). 

The C-termini of spectrin family proteins also exhibit a variety of 
structural similarities including motifs involved in protein-protein interac- 
tions (Fig. 1). EF-hands motifs are found in all but (-spectrin. The 
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function of this domain is most notable in a-actinin, where the binding of 
calcium is able to affect the interaction of the ABD with F-actin. This is 
only the case though for the nonmuscle isoforms of a-actinin, as the 
muscle isoforms are calcium insensitive (Blanchard et al., 1989). The C- 
terminus of a-spectrin has also been found to contain EF hands similar to 
those of a-actinin. It is believed that these structures may play a role in 
modulating the functional conformation of the §-spectrin ABD when N- 
and C-termini are juxtaposed in the spectrin heterodimers. The EF hands 
of dystrophin have been predicted to be unable to bind calcium, although 
it is thought that they have an important structural role (Huang et al., 
2000). Finally, a number of smaller motifs have been found at or within 
the C-termini of these proteins. The nonerythroid form of -spectrin 
has been shown to possess a pleckstrin homology (PH) domain at its 
C-terminus. This fold is conserved in proteins that can interact with 
phospholipids and may allow direct interaction with the cell membrane. 
The remainder of the motifs are found within the C-terminus of dystro- 
phin and utrophin. These motifs consist of WW and ZZ domains that, 
together with the EF-hand region, form the cysteine-rich region. The WW 
domain is an example of a protein-protein interaction module that binds 
proline-rich sequences (Ilsley et al., 2002), whereas the ZZ domain is a 
zinc finger motif also involved in mediating protein-protein interactions 
(Ponting et al, 1996). At the extreme C-terminus of dystrophin and 
utrophin are two helices predicted to form dimeric coiled-coils (Blake 
et al., 1995) that mediate interaction with the dystrophin family proteins 
dystrobrevin and DRP2. 


IV. FUNCTION 


Spectrin is a common component of the submembranous cytoskeleton. 
It was first identified as a major constituent of the erythrocyte membrane 
cytoskeleton, but has since been found in many other vertebrate tissues as 
well as in the nonvertebrates Drosophila, Acanthamoeba, Dictyostelium, and 
echinoderms (Bennett and Condeelis, 1988; Byers et al., 1992; Dubreuil 
et al., 1989; Pollard, 1984; Wessel and Chen., 1993). The ease with which 
spectrin could be isolated from erythrocyte ghosts made it an ideal candi- 
date for the study of the biochemical processes involved in the assembly 
and organization of the cytoskeleton (Gratzer, 1985). 

The human erythrocyte possesses a characteristic biconcave shape 
and remarkable viscoelastic properties. Electron microscopy studies per- 
formed on red blood cells (RBC), ghosts, and skeletons revealed a 
two-dimensional lattice of cytoskeletal proteins. This meshwork of pro- 
teins was thought to determine the elastic properties of the RBC. This 
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notion was supported further when it was found that detergent-extracted 
skeletons exhibited shape memory; since spectrin was the major constitu- 
ent of these skeletons, it was suggested that it was responsible for the 
elastic properties. The protein lattice that laminates the inner surface of 
the erythrocyte membrane is formed from interactions between actin, 
spectrin, and integral membrane proteins (Bennett and Gilligan, 1993). 
The lattice is predominantly formed from a- and /-spectrin dimers that 
again dimerize to generate tetramers roughly 200 nm in length. Five or six 
of these tetramers bind through their tail ends to a junctional complex, 
consisting of filamentous actin and band 4.1 (reviewed in Winkelmann 
and Forget, 1993). The molecular function of the complete spectrin 
heterodimer relies on the inter- and intramolecular interactions that 
occur at two key points within the spectrin molecule. These associations 
take place at the head end, where interchain binding between a- and /- 
spectrin gives rise to a heterodimer or the tetramer. The tail end of the 
molecule contains sites responsible for the interchain binding between 
spectrin chains integrating spectrin tetramers into a network via interac- 
tions with actin, protein 4.1, and other binding partners. Between the 
head and tail regions of the molecule, much of the overall length of 
spectrin is attributed to the number of spectrin repeats, 20 repeats for 
a-spectrin and 17 repeats for {-spectrin. The spectrin tetramers associate 
with short actin oligomers to form a regular repeating polygonal lattice. 
This network is coupled to the membrane via a limited number of direct 
and indirect contacts between spectrin and integral membrane proteins. 
These attachments consist of interactions between ankyrin and band 3 
protein, and between protein 4.1 and glycophorin C. 

a-Actinin is the smallest member of the spectrin family of proteins 
(Pascual et al., 1997). It was first described as an actin crosslinker in 
skeletal muscle, but has subsequently been found to be ubiquitously 
expressed (Otto, 1994). Additional family members have been found in 
smooth muscle and nonmuscle cells (Blanchard et al., 1989) and are 
localized at the leading edge, cell adhesion sites, focal contacts, and along 
actin-stress fibers in migrating cells (Knight et al., 2000). The functional 
unit of a-actinin is an antiparallel homodimer of polypeptide chain 
mass of 94-103 kDa (Blanchard et al., 1989), in which the amino-terminal 
CH domains together with the carboxy-terminal calmodulin (CaM) ho- 
mology domain form the actin-binding heads of the molecule (Critchley, 
2000). The connection between these two heads is composed of four 
spectrin repeats, which define the distance between the actin filaments 
that are crosslinked. a-Actinin has three main biological functions. It is the 
major thin filament crosslinking protein in the muscle Z-discs, where it 
holds the adjacent sarcomeres together (Masaki et al., 1967). a-Actinin is 
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also found close to the plasma membrane where it crosslinks cortical actin 
to integrins (Otey et al., 1990) and serves as a linker between transmem- 
brane receptors and the cytoskeleton. In nonmuscle cells, a-actinin is a 
major component of stress fibers, an analogous contractile structure to the 
more organized units found in striated muscle (reviewed by Otey and 
Carpen, 2004). a-Actinin is also a constituent of dense bodies (Lazarides 
and Burridge, 1975), which are believed to be structurally and functionally 
analogous to the sarcomere Z-disk. 

Dystrophin is the product of the largest known gene within the human 
genome, spanning approximately 2.5 Mb of genomic sequence and com- 
posed of 79 exons (Coffey et al., 1992; Roberts et al., 1993). The protein 
product encoded by the transcript of this gene is known as dystrophin, and 
the absence of this protein results in Duchenne muscular dystrophy 
(Koenig et al., 1987). Dystrophin is predominantly expressed in skeletal 
and cardiac muscle but small amounts are found in the brain. These full- 
length isoforms are under the control of three independently regulated 
promoters referred to as brain, muscle, and Purkinje, the names of which 
reflect the site at which dystrophin expression is driven. Additionally, four 
internal promoters give rise to truncated C-terminal isoforms, and alter- 
native splicing further increases the number of isoforms and variants. The 
spectrin repeats form the bulk of the protein (Fig. 1) and are thought to 
allow flexibility and give the molecule a rodlike structure. Dystrophin can 
be found associated with the plasma membrane of cardiac and skeletal 
muscle, where it interacts with the integral membrane protein dystrogly- 
can that binds to laminin on the extracellular face. The dystrophin-dys- 
troglycan complex further interacts with the integral membrane 
sarcoglycan proteins and peripheral membrane proteins syntrophin and 
dystrobrevin, which together comprise the dystrophin glycoprotein com- 
plex (reviewed in Winder et al., 1995a). This complex of proteins can then 
interact with F-actin via the N-terminus of dystrophin to form a flexible 
link between the basal lamina of the extracellular matrix and the internal 
cytoskeletal network (Campbell and Kahl, 1989; Rando, 2001). It is be- 
lieved that this complex serves to stabilize the sarcolemma and protect 
muscle fibers from contraction-induced damage. Indeed, the absence or 
mutation of dystrophin results in the X-linked myopathies Duchenne and 
Becker muscular dystrophies (DMD and BMD, respectively; reviewed in 
Blake et al., 2002). 

Within the skeletal musculature, dystrophin plays an important role in 
maintaining the integrity of the sarcolemmal membrane. Dystrophin is not 
able to perform this task alone and interacts with a number of other proteins 
that include dystroglycans, sarcoglycans, dystrobrevins, syntrophins, and 
sarcospan (Straub and Campbell, 1997). Mutation of dystrophin or, indeed, 
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other components of this complex can result in a variety of disease 
pathologies. Most notable is that of DMD, where a complete absence of 
dystrophin is observed, resulting in progressive muscle wasting and even- 
tual death of the affected individual. This complex of proteins is thought 
to provide a link between the cortical actin cytoskeletal network and 
laminin in the extracellular matrix (Ervasti and Campbell, 1993a,b). 
Hence, it is thought that dystrophin and the associated proteins provide 
a mechanically stabilizing role that protects the sarcolemmal membrane 
from the shear stresses generated during eccentric contraction of muscle 
(Petrof et al., 1993). Dystrophin has been found to localize adjacent to the 
cytoplasmic face of the sarcolemmal membrane in regions known as 
costameres (Porter et al., 1992; Straub et al., 1992). These assemblies of 
cytoskeletal proteins are involved in linking the force-generating sarco- 
meric apparatus to the sarcolemmal membrane (Craig and Pardo, 1983; 
Pardo et al., 1983). Costameres transmit contractile forces laterally through 
the sarcolemmal membrane to the basal lamina (Danowski et al., 1992). It 
has been found that dystrophin is not required for the assembly of several 
of the proteins that comprise costamere-like structures, but its absence 
does lead to an altered costameric lattice (Ehmer et al., 1997; Minetti et al., 
1992; Porter et al., 1992; Williams and Bloch, 1999). These data suggest 
that dystrophin plays an important role in the organization or stability of 
costameres, perhaps via an interaction with actin filaments (Rybakova ei al., 
2000). Rybakova and colleagues (2000) showed that the dystrophin com- 
plex formed a mechanically strong link between the sarcolemma and the 
costameric cytoskeleton through interaction with y-actin filaments. 
Following the discovery of the dystrophin gene, another cDNA was 
identified that showed considerable homology to that of dystrophin (Love 
et al., 1989). Initially this protein was referred to as dystrophin-related 
protein (DRP), but once cloned and sequenced (Tinsley et al., 1992) it was 
subsequently renamed utrophin due to a ubiquitous expression pattern 
compared to that of dystrophin. In muscle cells, utrophin shares a high 
degree of functional similarity with dystrophin (Claudepierre et al., 1999; 
Earnest et al., 1995; Loh et al., 2000; Matsumura et al., 1993; Nguyen et al., 
1991; Pons et al., 1994; Raats et al., 2000) and has been proposed as a 
potential therapeutic replacement for dystrophin in the treatment of 
DMD (Matsumura et al., 1992; Pearce et al., 1993; Tinsley et al., 1992). 


A. Actin-Binding Domains 


The actin-binding domains (ABD) of spectrin, a-actinin, and dystrophin 
consist of approximately 240 residues that comprise two functionally 
distinct but structurally equivalent domains (Gimona and Winder, 1998; 
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Matsudaira, 1991). These domains have been named calponin homology or 
CH domains (Castresana and Saraste, 1995) based on the sequence similar- 
ity to the smooth muscle regulatory protein calponin (Winder and Walsh, 
1990), where this domain is found only as a single copy. Compared to the 
single CH domain seen in calponin, the double domain found in the 
spectrin superfamily of proteins has been proposed to arise through a 
process of gene duplication (Castresana and Saraste, 1995; Matsudaira, 
1991). Furthermore, phylogenetic analysis of CH-domain-containing pro- 
teins has revealed that there is a greater similarity between the N-terminal 
CH domains (CH1) and the C-terminal CH domains (CH2) in any of 
the classes of actin-binding proteins than between the CH domains 
found within the same protein (Banuelos et al., 1998; Keep et al., 1999a; 
Korenbaum and Rivero, 2002; Stradal et al., 1998). Some actin-binding 
proteins have been found to contain a single CH domain, although the 
binding interaction requires additional elements since the isolated CH 
domains from these proteins do not exhibit analogous actin binding when 
compared to the whole protein (Gimona and Mital, 1998). Functionally, 
it should be noted that characterization of bacterially expressed CH 
domains corresponding to the actin-binding domains of a-actinin, dystro- 
phin, and utrophin were found not to be equivalent (Way et al., 1992; 
Winder et al., 1995b). When the two CH domains are separated and then 
used in actin-binding experiments, it has been found that only CHI has 
the ability to interact with F-actin, although the affınity is reduced. CH2 
has little or no intrinsic actin-binding activity, but it is obvious that its 
presence is functionally important. It has been shown that both of the CH 
domains are required to achieve the greatest interaction with F-actin and 
hence, single CH domains are not regarded as actin-binding domains per 
se (Gimona and Winder, 1998). 

The first crystal structures of CH domains were published in 1997. 
These were the CH2 domain of spectrin (Carugo et al., 1997) and the 
N-terminal ABD of fimbrin, comprising two CH domains (CH1.1 and 
CH2.1) (Goldsmith et al., 1997). The crystal structure of the second utro- 
phin CH domain (CH2) was later published by Keep et al. (1999a). It was 
not long, however, before the complete actin-binding domain of utrophin 
was crystallized (Keep et al, 1999b), followed closely by dystrophin 
(Norwood ei al., 2000). The CH domain is a compact globular domain 
that appears to show a high degree of structural conservation. Overall, the 
domain comprises four main a-helices (A, C, E, and G) that are approxi- 
mately 11 to 18 residues in length and exhibit a roughly parallel orienta- 
tion. Three shorter helices (B, D, and F) are less regular and form lesser 
secondary structure elements. The structure can be considered to com- 
prise a number of layers. The core of the domain is formed by a parallel 
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arrangement of helices C and G, which are then sandwiched between 
helix E and the N-terminal helix A (Broderick and Winder, 2002). 

The crystal structures of CH domains from a-actinin, dystrophin, utro- 
phin, fimbrin, spectrin, and plectin (Garcia-Alvarez et al., 2003) have been 
solved to date. These structures have given insight into certain aspects of 
CH domain function, but have also raised many new questions regarding 
the interaction of these domains with actin. The dimeric organization 
displayed by the crystallized utrophin and dystrophin ABDs contrasts 
strongly with that of the a-actinin and the related fimbrin actin-binding 
domain (K. Djinovic-Carugo, personal communication; Goldsmith et al., 
1997). When crystallized, fimbrin and a-actinin do so as compact mono- 
mers, where the two CH domains fold back on themselves to form a 
compact globular structure. These same interfaces are involved in the 
dimerization seen in utrophin, except it is the CH1 and CH2 domains 
from separate molecules that interact. The preservation of an interface 
between two domains of the same or related proteins when in monomeric 
or oligomeric forms is known as three-dimensional domain swapping 
(Schlunegger et al., 1997). Recent cryo-EM reconstructions of these 
domains with F-actin have not served to definitively resolve the mode of 
interaction of these proteins with F-actin. Reconstructions of fimbrin 
bound to F-actin have been modeled on a compact conformation (Hanein 
et al., 1998) and the reconstruction of utrophin with F-actin on an extend- 
ed conformation (Moores et al., 2000b). Electron diffraction and model- 
ing of the a-actinin molecule bound to F-actin showed that the ABD could 
be associating as an open bilobed structure (Tang et al., 2001; Taylor and 
Taylor, 1993). The cryo-EM reconstruction of a-actinin with F-actin 
(McGough et al., 1994), however, revealed a more globular difference 
density, suggesting that a-actinin might also associate with F-actin 
in a manner more analogous to the compact mode of interaction seen 
in the fimbrin crystal structure. The helical linkers between the two CH 
domains of these proteins may play an important role in determining the 
flexibility between the two CH domains and, subsequently, the manner 
they interact with F-actin. Gel-filtration studies of the utrophin ABD have 
shown it to be monomeric when in solution (Winder et al., 1995a). Hence, 
the crystal structures of the ABDs from fimbrin and utrophin may repre- 
sent two conformational extremes within this class of actin-binding pro- 
teins (Keep et al., 1999b). Several modes of interaction with F-actin have 
been demonstrated in cryo-EM studies with the utrophin ABD (Galkin 
et al., 2002). 

Within the N-terminal, ABDs of three actin-binding sites (ABS) from 
spectrin superfamily proteins have been delineated (ABS1, ABS2, and 
ABS3). The first and third ABS have been localized to the al helix in 
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the first and second CH domains, respectively. These sites were originally 
identified using synthetic peptides derived from dystrophin (Levine et al., 
1990, 1992). The second ABS, corresponding to helices a5 and a6 in the 
CH1 domain, was first identified in the Dictyostelium actin-gelation factor 
ABP-120 (Bresnick et al., 1990, 1991). ABS2 was later identified in a-actinin 
using in vitro actin-binding studies with glutathione S-transferase (GST) 
fusion proteins (Kuhlman et al., 1992). It was later recognized that the ABS 
sequences were not part of a fully folded globular protein and hence 
residues not normally involved in actin-binding may have been allowed to 
interact with actin. Structural approaches have begun to shed light on the 
mechanism by which the spectrin superfamily of proteins interacts with 
the cytoskeleton. Biochemical work studying the actin-binding sites of a- 
actinin (Lebart et al., 1990, 1993; McGough et al., 1994; Mimura and 
Asano, 1987) and dystrophin (Levine et al., 1992) have been successful 
in identifying actin subdomain 1 as an important binding site for this 
particular class of proteins. Electron microscope reconstructions of F-actin 
decorated with a-actinin (McGough et al., 1994; Tang et al., 2001) reveal 
that actin subdomain 1 forms the major site of interaction. Helical recon- 
struction also revealed that these two proteins interacted with adjacent 
actin monomers on the long pitch helix, a site apparently shared by most 
F-actin binding proteins (McGough, 1998). 

The crystal structure of the utrophin ABD suggested an alternate model 
to that of the association of fimbrin with actin (Keep et al., 1999b). As 
utrophin crystallized as a head-to-tail dimer, each of the monomers 
adopted an extended conformation. This arrangement placed the pre- 
dicted ABSs on the surface of the protein, clearly enabling interaction with 
actin. The dimerization of utrophin seen in the crystal conserved the inter- 
CH domain interfaces, suggesting that utrophin may adopt a more com- 
pact conformation when in solution. To date, there is no evidence to 
support anything other than a monomeric conformation of utrophin 
when in solution, as the binding stoichiometry with actin is 1:1 (Keep 
et al., 1999b; Moores and Kendrick-Jones, 2000a; Winder, 1996). However, 
the crystallization of utrophin as a dimer suggested that the ABDs of this 
protein might be flexible and allow actin-binding in an open conforma- 
tion, even when utrophin exists as a monomer in solution. Moores et al. 
(2000b) developed this idea further by demonstrating a model of utro- 
phin-actin binding that contrasted that of fimbrin bound to actin. A 
pseudo-atomic model of utrophin bound to F-actin in an open conforma- 
tion was detailed, showing that all of the ABSs could be directly involved in 
the actin interaction. This mode of binding was found to create a different 
conformational change within actin compared to that caused by fimbrin, 
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suggesting that an induced fit mechanism involving conformational 
flexibility of actin and utrophin may be crucial to their interaction 
(Moores et al., 2000b). The validity of this model has recently been called 
into question (Galkin et al., 2003), however, and alternative models have 
been proposed (Galkin et al., 2002). The comparison between the previ- 
ously published model of fimbrin binding to actin and that of utrophin 
shows that both proteins possess a totally different mode of interaction 
even though their ABDs share sequence homology. Such a difference is 
likely to be related to the overall function of each protein, but the 
utrophin model has identified an alternate means of actin association 
within a large family of proteins important to cellular organization. It 
should be noted that Moores et al. (2000b) did not exclude the compact 
orientation in utrophin actin binding. However, their model relies on 
inherent protein flexibility. Indeed, models of utrophin ABDs have been 
generated where association with F-actin is made in a closed compact 
conformation (Sutherland-Smith et al., 2003). Recent crystallographic 
and calorimetric studies of the plectin ABD demonstrated that while the 
two CH domains associate to form a closed conformation in the crystal 
structure, binding to F-actin induces the open conformation (Garcia- 
Alvarez et al., 2003). Elucidation of spectrin’s interaction with actin at 
the molecular level, however, has been hampered by an inability to express 
a functional spectrin ABD in isolation. 


B. Spectrin Repeat Region 


The rod domain of a-actinin is the shortest within this family of proteins 
and comprises just four spectrin-like repeats. Given the reduced length of 
this domain, it is feasible to assume a greater degree of rigidity, especially 
as the functional unit of a-actinin is a dimer. The spectrin repeats of the 
rod domain are essential to the dimerization of a-actinin. The association 
of the rod domains from two monomers leads to a much more stable and 
less flexible domain overall (Djinovic-Carugo et al., 2002). Spectrin, dys- 
trophin, and the related protein utrophin all contain many more spectrin 
repeats that seem to play a more direct role in the cellular function of 
these proteins. Studies performed by Pasternak and colleagues (1995) 
show the sarcolemma of muscles from the mdx mouse to be four times 
less stiff than in controls, demonstrating directly that dystrophin and its 
associated proteins reinforce the stability of the sarcomere. Spectrin forms 
roughly 5% of the total protein of the erythrocyte and is pivotal to the 
formation and function of the submembranous skeleton in red blood 
cells. This erythrocyte plasma membrane possesses remarkable mechanical 
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properties, more like an elastic semisolid (Evans and Hochmuth, 1977). 
This allows storage of energy during deformation (e.g., squeezing through 
a narrow capillary), but allows the erythrocyte to return to its normal shape 
once the deformation ceases (see review by Bennett and Gilligan, 1993). 

The ability of erythrocytes to survive repeated deformation is essential to 
their physiological function and their prolonged life within the vascula- 
ture. Direct evidence has determined that spectrin is the major factor in 
providing the elastic properties exhibited by the erythrocyte. Studies of 
erythrocytes from patients suffering from hereditary spherocytosis clearly 
demonstrate this (Waugh and Agre, 1988), as reduced quantities of 
spectrin result in a greater extent of clinical severity and a reduction in 
the force required to deform the affected erythrocytes (Agre et al., 1985, 
1986). This perhaps results from an overall effect of the structure of the 
submembranous lattice on the whole, but the properties of this mesh- 
work of proteins can be linked to the properties of the spectrin repeats 
found within the rod domain. For many cytoskeletal and adhesion pro- 
teins, the ability to survive extension and deformability is pivotal to their 
role in a cellular environment. Atomic force microscopy (AFM) has been 
employed to examine the extensibility of spectrin repeats (Rief et al., 
1999). These studies have determined that the a-helical spectrin repeat 
can be forced to unfold in a stochastic one-domain-at-a-time fashion (Rief 
et al., 1999). 

The availability of tandem spectrin repeat structures from nonerythroid 
a-spectrin (Grum et al., 1999) and the four repeat rod domain of a-actinin 
(Ylanne et al., 2001a) have shown that individual spectrin repeats should 
not be considered as such, and that the interspectrin repeat links are 
actually formed from contiguous helices rather than flexible linkers (Law 
et al., 2003). This has implications for the manner that spectrin repeats 
respond to mechanical stress, inasmuch as the repeats within the rod 
domain do not unfold one at a time. Rather, they are subject to a cooper- 
ative manner of forced unfolding (Law et al., 2003). Helical linkers be- 
tween spectrin repeats have been implicated to help explain the 
extensibility and elasticity observed within the erythrocyte cytoskeleton. 
The unfolding of spectrin repeats might explain thermal-softening 
(Waugh and Evans, 1979) and strain softening of the RBC submembra- 
nous network (Lee and Discher, 2001; Markle et al., 1983). Additionally, it 
has also been shown that tandem spectrin repeats are thermodynamically 
more stable than individual repeats and that tandem repeats unfold 
in unison, behaving similarly to an individual repeat (MacDonald and 
Pozharski, 2001). 
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The rod domains of spectrin family proteins are assumed to function 
solely as structural spacers that serve to separate the C-terminus from the 
N-terminal actin-binding domain. While this is likely to be the case for a- 
actinin (and to a certain extent in spectrin), natural mutations and 
transgenic experiments would suggest otherwise for dystrophin. It was 
widely thought that the rod domains of dystrophin (and utrophin) served 
as flexible spacers or shock absorbers between the actin cytoskeleton and 
the sarcolemmal membrane (Winder et al., 1995a). However, is the length 
of this “shock absorber” crucial to the function of the protein? An 
individual with a large deletion in the dystrophin gene encompassing 
46% of the entire protein and 73% of the rod region (repeats 4-19) 
presented with a very mild BMD phenotype (England et al., 1990). This 
would tend to suggest that rod domain length is not essential with regards 
to the proposed shock-absorbing role of the protein. The fact that a 
dystrophic phenotype is observed, regardless of how mild, would suggest 
that there is functional importance regarding the length of the rod 
domain. However, dystrophin minigenes have been designed on the basis 
of this shortened dystrophin and have been used to correct the dystrophic 
phenotype in mdx mice (Phelps et al., 1995; Wells et al., 1995). 

Similarly, a minispectrin has also been generated that consists of the 
N-terminal ABD and the first two spectrin repeats of (@-spectrin, and the 
C-terminus of a-spectrin consisting of the last two spectrin repeats and 
the calmodulin-like domain. This construct was still able to dimerize, bind 
F-actin, and induce the formation of bundles, but it is unlikely to be 
functional in vivo (Raae et al., 2003). The shock-absorbing role of the 
spectrin repeats found within these proteins is widely accepted, which 
leads to the question of why there are so many coiled coils in dystrophin 
and utrophin. a-Actinin contains only four spectrin repeats, which medi- 
ate dimerization and result in a rather inflexible link between the termini 
of the protein (Ylanne et al., 2001a). In this case, the length of the rod 
domain clearly defines the distance at which filamentous actin can be 
crosslinked. a-Actinin is localized to structures that require actin filaments 
to be crosslinked in either a parallel or antiparallel fashion. This requires 
the a-actinin dimer to bind actin filaments in orientations separated by as 
much as 180 degrees. Furthermore, a-actinin is able to accommodate a 
range of interactin filament crosslinking distances, from 15-40 nm (Liu 
et al., 2004; Luther, 2000; Taylor et al., 2000). The domain is essential to 
the formation of the functional dimer and for the separation of the C- and 
N-termini of the protein, but it would seem that the flexible hinge that 
separates the rod from the ABD, and the ABD itself, play an important role 
in determining the ultimate crosslinking distance. 


220 BRODERICK AND WINDER 


C. Other Binding Partners 


The repeating constituents ofthe rod domains of spectrin family proteins 
were generally regarded as modules for the construction of elongated 
molecules (Winder, 1997). However, this is not the only function of spec- 
trin repeats. It is widely accepted that proteins containing spectrin repeats 
are localized to cellular sites that experience significant mechanical stress, 
and the properties of the spectrin repeat can be used to explain this 
functionality (see Section IV.B. above). Additionally, some spectrin repeats 
have acquired functions with a purely structural role; these are able to 
interact with a variety of structural and signaling proteins (Djinovic-Carugo 
et al., 2002). 

The function of spectrin superfamily proteins is particularly evident 
when taken in context of their cellular localization. They often form 
flexible links or structures that allow interactions with the cellular cyto- 
skeletal architecture and the membrane. In both spectrin and dystrophin, 
such a function is performed, but the spectrin repeats of these molecules 
are also able to interact with actin and contribute to binding. A portion of 
the dystrophin rod domain that spans residues 11-17 contains a number 
of basic repeats that allow a lateral interaction with filamentous actin 
(Rybakova et al., 2002). The homologous utrophin can also interact lat- 
erally with actin. This interaction is distinct from that of dystrophin, as the 
utrophin rod domain lacks the basic repeat cluster and associates with 
actin via the first ten spectrin repeats (Rybakova et al., 2002). 5-Spectrin 
also exhibits an extended contact with actin via the first spectrin repeat. In 
this situation, it was found that the extended contact increased the associ- 
ation of the adjacent ABD with actin (Li and Bennett, 1996). In conjunc- 
tion with this interaction, it has been found that the second repeat is also 
required for maximal interaction with adducin (Li and Bennett, 1996), a 
protein localized at the spectrin-actin junction that is believed to contrib- 
ute to the assembly of this structure in the membrane skeletal network 
(Gardner and Bennett, 1987). In the erythrocyte cytoskeletal lattice, 
G-spectrin interacts with ankyrin, which in turn binds to the cytoplasmic 
domain of the membrane-associated anion exchanger. This indirect link 
to the cellular membrane occurs via repeat 15 of @-spectrin (Kennedy et al., 
1991) and is largely responsible for the attachment of the spectrin-actin 
network to the erythrocyte membrane (reviewed in Bennett and Baines, 
2001). A much larger number of direct links to transmembrane proteins 
have been determined for the spectrin repeats of a-actinin (reviewed in 
Djinovic-Carugo et al., 2002). 

The crystal structure of the a-actinin rod domain (Ylanne et al., 2001a) 
has allowed the analysis of surface features, leading to predictions of 
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possible protein-protein interaction sites. It was found that the most 
conserved surface residues were acidic in nature, which would correlate 
well with the relatively short basic sequences that can be found within the 
cytoplasmic domains of many transmembrane proteins (Ylanne et al., 
2001b). a-Actinin has been found to provide a direct link with a variety 
of transmembrane proteins including integrins, ICAMs, L-selectin, Ep- 
Cam, ADAM12, and NMDA receptor subunits (see Djinovic-Carugo et al., 
2002 for references). The a-actinin rod domain is also involved in a 
number of dynamic and regulatory interactions that involve interactions 
with titin (Young et al., 1998), myotilin (Salmikangas et al., 1999), ALP (Xia 
et ol, 1997), and FATZ (Faulkner et al., 2000) at the Z-disk of striated 
muscle and interactions with Rho-kinase type protein kinase N (PKN) 
(Mukai et al, 1997). All of these interactions occur through the rod 
domain of a-actinin and demonstrate the multivariance of the rod domain 
as a binding site for the interactions with these proteins (Djinovic-Carugo 
et al., 2002). Spectrin and dystrophin rod domains have also been demon- 
strated to interact directly with lipid surfaces, suggesting a lateral asso- 
ciation with biological membranes (An et al., 2004; DeWolf et al., 1997; 
Le Rumeur et al., 2003; Maksymiw et al., 1987). 


V. REGULATIONS OF INTERACTIONS 


The spectrin family of proteins, depending on the particular function, 
has numerous smaller motifs and binding sites for interaction with other 
proteins. These regions are important, as they are major protein-protein 
or protein-membrane interaction modules that bind to F-actin, proline- 
containing ligands, and/or phospholipids. Spectrin and dystrophin/utro- 
phin have all acquired copies of such domains since their evolution from 
a-actinin, presumably as a consequence of their more diverse roles in 
the cell. 


A. CH Domains 


The calponin homology (CH) domain has been identified in many 
molecules of differing function. However, its presence usually signifies 
an interaction of some sort with the actin cytoskeleton via an association 
with F-actin. The domain was initially identified as a 100-residue motif 
found at the N-terminus of the smooth muscle regulatory protein calponin 
and, hence, was termed the CH domain (Castresana and Saraste, 1995). 
The refinement of algorithms for the identification of distinct protein 
motifs has allowed the identification of CH domains in proteins that range 
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in function from crosslinking to signaling (Korenbaum and Rivero, 2002). 
Despite the functional variability of this domain, the secondary structure is 
conserved remarkably well between proteins that contain it (Bramham 
et al., 2002). 

Several mechanisms have been identified that seem to regulate the CH 
domains found in spectrin, dystrophin, and a-actinin. These range from 
effects induced by calcium via EF-hand motifs, PIP binding, phosphory- 
lation, and interactions with calmodulin. The actin-binding properties 
of the nonmuscle isoforms of the F-actin crosslinker a-actinin can be 
regulated via the presence of EF hands. Calcium does not directly regulate 
a-actinin’s CH domains interaction with F-actin, but it does bind to the 
EF-hand motif present in the molecule. As a-actinin dimerizes, this brings 
the CH domains and EF hands in the antiparallel dimer in close associa- 
tion. The conformational changes induced in the EF-hand motif can then 
exert an effect on the CH domains to influence the interaction with 
F-actin (Noegel et al, 1987; Tang et al., 2001). a-Actinin has also been 
found to bind phosphatidylinositol (4,5)-bisphosphate (PIP) at the mus- 
cle Z-line (Fukami et al., 1992). The PIP, binding site has been deli- 
neated to a region immediately C-terminal of the third ABS (Fukami 
et al., 1996), although the precise mechanism of control is not known 
for this region. It has been found, though, that in nonmuscle cells where 
a-actinin is associated with actin, this region contained bound PIP», where- 
as free a-actinin did not. This implicates a role of PIP» in the activation of 
a-actinin-induced actin bundling (Fukami et al., 1994). 

Calmodulin has also been shown to regulate the interaction of the ABDs 
from dystrophin (also utrophin) and a-actinin by binding directly to the 
CH domains (Bonet-Kerrache et al., 1994; Jarrett and Foster, 1995; Winder 
et al., 1995b) suggesting a potential role for modulating the attachment of 
these proteins to the cytoskeleton. 

Recently, it has been shown that a-actinin is phosphorylated by focal 
adhesion kinase (FAK) and that this phosphorylation reduces the ability of 
a-actinin to bind actin (Izaguirre et al., 2001). The site of tyrosine phos- 
phorylation is N-terminal to the first CH domain in a region that is most 
conserved between spectrin family proteins. 


B. EF Hands 


EF-hand regions are involved in the chelation of up to two divalent 
calcium cations (occasionally magnesium) via an interaction through a 
paired helix-loop-helix structure (Tufty and Kretsinger, 1975). Binding of 
calcium to this globular domain leads to a dramatic conformational 


change from ‘“‘closed’’ to “open,” exposing a hydrophobic surface that 
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binds to a target peptide, often helical in nature. However, divergent 
evolution has led to a subset of EF hands that no longer chelate calcium 
and possibly serve an alternate function (Nakayama and Kretsinger, 1994). 
This is exemplified in a-actinin nonmuscle isoforms, where calcium is 
bound via the EF hands, thus allowing regulation of actin binding. The 
muscle-specific isoforms of a-actinin have lost the ability to bind calcium 
through their EF hands (Blanchard et al., 1989), possibly to protect the 
muscle architecture from the potential destabilizing effect of calcium 
during calcium-induced contractions. 

Spectrin has also both retained and lost the ability to bind calcium. 
Calcium and calmodulin bind to human nonerythroid spectrins (aIIGII) 
at sites that have either degenerated or are absent in erythroid spectrins 
(al G1) (Lundberg et al., 1992). The roles of nonerythroid spectrins are far 
more diverse and, hence, calcium and calmodulin might participate in 
regulatory events not required in the erythrocyte (Buevich et al., 2004). 
The EF hands of a-spectrin are brought in close opposition with the 
B-spectrin ABD once the proteins form the heterodimer. The EF hand 
is then able to exert regulatory control over the actin-binding activity of 
the adjacent domain. The molecular details of how this is achieved are still 
to be determined, however. A similar interface is observed in a-actinin. It 
is thought that the EF-hand region could engage the actin-binding do- 
main in a manner analogous to calmodulin binding a target peptide. 
Regulation of the interaction would be affected by the binding of calcium 
to the EF hand, which would cause a conformational change resulting in 
altered interaction surfaces. The calcium-binding activity of nonerythroid 
spectrin has been located to the two EF hands present in the C-terminus of 
the all-spectrin (Lundberg et al., 1995; Trave et al., 1995). Buevich and 
colleagues (2004) found that the EF hands in nonerythroid spectrin 
exhibited a degree of cooperativity in their binding of calcium, suggesting 
that ERT binds before EF2 and modulates the affinity of EF2 for calcium, 
although overall, calcium binding to a-spectrin has been found to be 
much weaker than to other EF-hand-containing proteins such as troponin 
C and calmodulin (Zhang et al., 1995a). 

Each of the three EF-hand structures solved from the spectrin family 
proteins exhibit unique structural and functional differences, even though 
all are fundamentally similar. The a-spectrin (nonerythroid) EF hands 
bind calcium and presumably perform some kind of regulatory role 
regarding the actin-binding function of spectrin (Trave et al., 1995). 
Due to the low calcium affinity, it is expected that calcium regulatory 
events involving spectrin would occur in areas of the cell that would 
experience a transient but significant fluctuation of calcium concentra- 
tion (Buevich et al., 2004). It is possible that the calcium-bound form of 
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spectrin in the cell would be stabilized by accessory proteins, as none- 
rythroid spectrin interacts with many proteins that are involved in regu- 
latory events and not just with the cytoskeleton. In muscle a-actinin 
(isoform 2), the third and fourth EF hands can be referred to as ‘‘empty”’ 
on the basis of a lack of key liganding residues and large insertions in the 
helix-loop-helix motif. This muscle isoform of a-actinin is important in 
striated muscle Z-disk structure, where it interacts with F-actin and titin. 
The structure of this complex was solved and showed the titin Z-repeat 
peptide bound in a groove formed by the partially open lobes of the two 
EF hands (Atkinson et al., 2001). The EF hands of dystrophin were solved 
as part of a larger structure that also included the adjacent WW domain 
(Huang et al., 2000). These EF hands had been predicted to be unable to 
bind calcium due to a lack of key liganding residues (Winder, 1997). The 
structure of the dystrophin WW-EF region indicated that the EF hands 
may play a structural role and that they are not required to bind either 
calcium or a target peptide (Huang et al., 2000). It is still to be elucidated 
if this region of dystrophin interacts with other target peptides, but as the 
EF hands are oriented in a closed and compact manner, it is difficult 
to see how these interactions would occur (Broderick and Winder, 
2002). Indeed, studies with constructs spanning both the WW domains 
and EF-hand regions of dystrophin and utrophin have failed to show any 
calcium-induced regulation of binding to d-dystroglycan (James et al., 
2000; Rentschler et al., 1999). 


C. Lipid Binding 

Pleckstrin homology (PH) domains are motifs that are approximately 
100 amino acids in length and have been identified in over 100 different 
eukaryotic proteins. They are thought to participate in cell signaling and 
cytoskeletal regulation via interactions with phospholipids (Lemmon and 
Ferguson, 1998; reviewed in Rebecchi and Scarlata, 1998). It has been 
suggested that these domains function as membrane anchors and tethers, 
as PH domains are often found within membrane-associated proteins 
(Ferguson et al., 1995). The domain was first recognized in 1993 (Haslam 
et al., 1993; Mayer et al., 1993; Musacchio et al., 1993) and was quickly 
followed by the determination of 3-D structures. 

The (spectrin PH domain structure was solved in a lipid-free (Zhang 
et al., 1995b) and lipid-bound form (Hyvonen et al., 1995). The role of the 
spectrin PH domain has been proposed to be part of the mechanism 
whereby spectrin associates directly with the membrane through binding 
phospholipids. The submembranous framework formed by spectrin is 
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linked to transmembrane polypeptides via peripheral proteins such as 
ankyrin and band 4.1 (Viel and Branton, 1996), and spectrin is essential 
to the integrity of this network. The PH domain of spectrin has been 
found to have a weak affinity and specificity for PI(4,5)Ps (Harlan et al., 
1994; Hyvonen et al., 1995). Lemmon and colleagues (2002) suggested 
that the @-spectrin/membrane interaction is driven by a delocalized elec- 
trostatic attraction between an anionic ligand and the positively charged 
face of the polarized PH domain. The PH domain of spectrin appears to 
fall into a class of PH domains that exhibit a moderate affinity for the 
phosphoinositides. In cells, this polarized domain may direct a few spec- 
trin isoforms to PI(4,5)Ps enriched sites such as caveoli or focal adhesions 
(Burridge and Chrzanowska-Wodnicka, 1996), where other determinants 
of membrane association are likely to play an equal or more dominant role 
in stabilizing attachment. Although membrane attachment is not neces- 
sarily dependent on this domain, it has been shown that the PH domain of 
the human 6122 spectrin isoform binds to protein-depleted membranes 
containing PI(4,5)Ps and to Ins(1,4,5)Ps in solution (Wang and Shaw, 
1995). This domain localizes to plasma membranes in COS7 cells (Wang 
et al., 1996). Ins(1,4,5)P3 binding has been found to perturb residues 
located in or near loop 1 of the Drosophila spectrin PH domain, as is the 
case for the N-terminal PH domain and the mouse form of (-spectrin 
(Zhou et al., 1995). The binding site of @-spectrin has no elaborate 
hydrogen-bonding network and the inositol ring has no specific contacts 
with the protein, unlike the PH domain of PLC-6, (Ferguson et al., 1995). 
Moreover, spectrin does not bind Ins(1,4,5)P3 on the same face as PLC-ö,, 
whose binding pocket is located on the other side of the protein. 

The -spectrin PH domain binds weakly to all phosphoinositides and is 
likely to associate with the negatively charged membrane surface via the 
positively charged face of the domain. Spectrin networks contain many 
spectrin molecules, and it is likely that the individual weak association with 
phosphoinositides is overcome by the overall collective interaction of 
many molecules. Such a mechanism of multivariant association allows only 
the assembled cytoskeletal components to interact strongly with cellular 
membranes, such as in the RBC. 


D. Polyproline Binding Domains 


The Src homology 3 (SH3) domain of a-spectrin was the first SH3 
domain structure to be solved (Musacchio et al., 1992). The domain was 
initially identified as regions of similar sequence found within signaling 
proteins, such as the Src family of tyrosine kinases, the Crk adaptor 
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protein, and phospholipase C-y (Mayer et al., 1993). In the case of spec- 
trin, the specific target ligand and function are still to be identified. The 
domain is approximately 60 residues in length and has been identified in 
many proteins (Bateman et al., 1999; Rubin et al., 2000). The SH3 domain 
continues to be identified within a variety of proteins; it is subsequently 
regarded as one of the most common modular protein interaction do- 
mains found and is widespread in signaling, adaptor, and cytoskeletal 
protein alike (reviewed in Mayer, 2001). Due to the small size of this 
domain, a search for a potential function focused on protein-protein 
interactions, with screening of expression libraries soon identifying seem- 
ingly specific binding partners (Cicchetti et al., 1992). Binding studies have 
indicated that the interaction sites of SH3 domains were proline-rich, with 
PxxP being identified as a core conserved binding motif (Ren et al., 1993). 
It should be noted that profilin and WW domains also make use of a 
similar mode of interaction with proline-rich helical ligands (Ilsley et al., 
2002; Kay et al., 2000; Zarrinpar and Lim, 2000). 

As mentioned above, the WW domain is another example of a protein- 
protein interaction module that binds proline-rich sequences (Kay et al., 
2000). Dystrophin and utrophin WW domains interact predominantly with 
the extracellular matrix receptor dystroglycan, which contains a type 1 WW 
motif of consensus PPxY (reviewed in Ilsley et al., 2002; Winder, 2001). A 
structure of a WW domain from dystrophin was solved recently as part of 
a structure including the EF-hand region, and also with and without a 
bound (-dystroglycan peptide (Huang et al., 2000). 

Chung and Campanelli (1999) found that the interaction between 
utrophin and /-dystroglycan mirrored that of dystrophin. This is mainly 
mediated by the WW domain, which recognizes the PPPY peptide at the 
carboxy terminus of (-dystroglycan. Adhesion-dependent tyrosine phos- 
phorylation of d-dystroglycan within the WW domain binding motif has 
been found to regulate the WW domain-mediated interaction between 
utrophin and (-dystroglycan. This was the first demonstration of physio- 
logically relevant tyrosine phosphorylation of a WW domain ligand and 
parallels the tyrosine phosphorylation of SH3 domain ligands regulating 
SH3-mediated interactions (James et al., 2000). Investigations performed 
by Rentschler et al. (1999) have also determined that the EF-hand regions 
following the WW domain are necessary for WW binding. It was later 
shown that the integrity of the utrophin WW-EF-ZZ region is essential 
for efficient binding to 6-dystroglycan (Tommasi di Vignano et al., 2000). 
This binding activity can be abolished in utrophin if the ZZ domain is 
deleted, but only a reduction in binding is observed for dystrophin 
(Rentschler et al., 1999). 
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E. ZZ domain 


Dystrophin and its autosomal homologue utrophin contain a putative 
zinc finger motif within their C-terminal cysteine-rich domains, homolo- 
gous to domains found in sequences of a wide variety of proteins (Ponting 
et al., 1996). The ZZ domains of dystrophin and utrophin have been shown 
to bind zinc (Michalak et al., 1996; Winder, 1997) and are believed to be 
involved in mediating protein-protein interactions, although the precise 
function of the ZZ domain has not yet been elucidated. It has been found 
that the cysteine-rich domain of dystrophin is required for binding to 
G-dystroglycan; it has been shown that the ZZ domain strengthens the 
interaction between the dystrophin and utrophin WW-FF region with 
B-dystroglycan (James et al, 2000; Jung et al., 1995; Rentschler et al., 
1999). More recently, Ishikawa-Sakurai and colleagues identified the com- 
ponents of the C-terminal domain of dystrophin that are required for the 
full binding activity. They have detailed the extent of the C-terminal 
sequence (residues 3026-3345) that is required for effective binding 
and have identified cysteine 3340 within the ZZ domain as essential to 
the binding activity with 6-dystroglycan (Ishikawa-Sakurai et al., 2004). The 
functional importance of the ZZ domain has been proven further by 
the identification of a rare mutation where C3340 has been mutated to 
a tyrosine, resulting in the affected individual suffering from a form of 
DMD (Lenk et al. 1996). However, to date, no structure of any ZZ domain 
has been solved. C-terminal to the ZZ domain is a pair of highly con- 
served helices predicted to form dimeric coiled-coils (Blake et al., 1995). 
These helices, which are restricted to dystrophin family proteins (dystro- 
phin, utrophin, DRP2, and dystrobrevin), are involved in heterophilic- 
associations between family members and also members of the syntrophin 
family of proteins (reviewed in Blake et al., 2002). 


VI. DISEASE 


A. Duchenne Muscular Dystrophy 


DMD is a severe X-linked recessive, progressive muscle wasting disease 
that affects approximately 1 in 3500 newborn males (Emery, 1991). An 
allelic variant of DMD is also known, referred to as Becker muscular dystrophy 
(BMD). It has a later onset and lesser phenotype than DMD, resulting in 
longer life expectancy (reviewed in O’Brien and Kunkel, 2001). DMD is 
caused by mutations in the DMD gene that encodes the cytoskeletal linker 
protein dystrophin. The vast majority of DMD mutations result in the 
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complete absence of dystrophin, whereas a truncated protein is often 
associated with the milder Becker form of the disease (Kingston et al., 
1983). Mutations in the genes encoding other components of the dystro- 
phin-associated protein complex cause other forms of dystrophy, such as 
limb-girdle and congenital dystrophies. 

The cause of approximately 65% of DMD pathologies can be traced to 
large deletions or duplications within the dystrophin gene. The remaining 
cases are the result of small insertion/deletion mutations and point 
mutations (Koenig et al., 1989; Monaco et al., 1985; reviewed by Roberts 
et al., 1994). In DMD, it has been found that point mutations nearly always 
result in a truncation of the open reading frame causing nonsense- 
mediated decay, but rare cases are known where a truncated nonfunction- 
al protein is transcribed (Kerr et al., 2001). In BMD, most point muta- 
tions disrupt splicing, which results in an intact but interstitially deleted 
open reading frame and a partially functional protein (reviewed in 
Roberts et al., 1994). 

Mutations identified in all major domains of dystrophin result in disease 
phenotypes ranging from mild to severe. Beginning with the N-terminus, 
mutations have been identified that stem from missense and inframe 
mutations. In the ABD, a missense mutation resulting in an amino acid 
change of an arginine residue for leucine 54 results in a DMD phenotype 
with reduced levels of protein (Prior et al., 1995). DMD patients have also 
been described with inframe deletions of exons 3-25, indirectly resulting 
in normal levels of truncated protein (Vainzof et al., 1993). 

The rod domain of dystrophin has been found to accommodate large 
inframe deletions. A case where a patient was found to be missing exons 
17-48, corresponding to a 73% deletion of the rod domain, only exhibited 
a mild form of BMD (England et al., 1990). Large deletions of the rod 
domain have also been observed in other BMD patients (Love et al., 1991; 
Winnard et al., 1993), the phenotypes of which are usually milder than 
those of DMD. 

Few missense mutations have been described in DMD patients, although 
two informative substitutions have been identified in the cysteine-rich 
domain. The cysteine-rich domain contains a number of motifs that are 
important for regulation and protein-protein interactions. The substitu- 
tion of a conserved cysteine residue for a tyrosine at position 3340 results 
in reduced but detectable levels of dystrophin. This mutation alters one of 
the coordinating residues in the ZZ domain that is thought to interfere 
with the binding of the dystrophin-associated protein ß-dystroglycan 
(Lenk et al., 1996). Another substitution involving an aspartate to a 
histidine at position 3335 is also thought to affect the (-dystroglycan 
binding site (Goldberg et al, 1998). Removal of a highly conserved 
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glutamic acid (residue 3367) adjacent to the dystrophin ZZ domain results 
in a phenotype of DMD with substantial retention of a presumably 
functionally compromised dystrophin protein (Becker et al., 2003). Inter- 
estingly, the cysteine-rich domain is never deleted in BMD patients, sug- 
gesting that this domain is critical for dystrophin function as the BMD 
phenotypes are less severe (Rafael et al., 1996). 

A small number of cases have been identified in which there is a 
deletion of the carboxy-terminus of dystrophin. In these patients, it is 
common for the mutant protein to localize to the sarcolemma (Bies et al., 
1992; Helliwell et al., 1992; Hoffman et al., 1991). These cases are good 
examples of the importance of the cysteine-rich and C-terminal domains 
of dystrophin, presumably reflecting the importance of interactions with 
components of the dystrophin-associated glycoprotein complex. Many 
single point mutations within dystrophin are also known. 


B. Hereditary Spherocytosis 


The membrane skeleton acts as an elastic semisolid, allowing brief 
periods of deformation followed by reestablishment of the original cell 
shape (reviewed by Bennett and Gilligan, 1993). Erythrocytes in the 
human bloodstream have to squeeze repeatedly through narrow capil- 
laries of diameters smaller than their own dimensions while resisting 
rupture. A functional erythrocyte membrane is pivotal to maintaining 
the functional properties of the erythrocyte. This importance is apparent 
when examination is made of many hemolytic anemias, where mutation of 
proteins involved in the structure of the submembranous cytoskeleton, 
and its attachment to the lipid bilayer, result in a malformed or altered 
cytoskeletal architecture and a disease phenotype. 

Hereditary spherocytosis (HS) comprises a group of inherited hemolytic 
anemias characterized by chronic hemolysis with a broad spectrum of 
severity (Hassoun et al., 1997). The principal cellular defect is the loss of 
erythrocyte surface area relative to the intracellular volume, although 
increased osmotic frailty is also a factor. A distinctive spherical red blood 
cell (RBC) morphology is observed in sufferers of HS and splenic destruc- 
tion of these abnormal erythrocytes is the primary cause of the hemolysis 
experienced (Delaunay, 1995; Palek and Jarolim, 1993). 

The primary biochemical defects of HS are linked to proteins important 
to the interaction between the membrane skeleton and the lipid bilayer 
involving a- and /-spectrin, ankyrin, band 3, and protein 4.2 (Gallagher 
and Forget, 1998). Combined spectrin and ankyrin deficiency (Coetzer 
et al., 1988; Pekrun et al., 1993; Savvides et al., 1993) is most commonly 
observed, followed by band 3 deficiency (Iolascon et al., 1992; Jarolim et al., 
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1990; Lux et al., 1990), isolated spectrin deficiency (Agre et al., 1986; Eber 
et al., 1990), and then protein 4.2 deficiency (Bouhassira et al., 1992; 
Ghanem et al., 1990; Rybicki et al., 1988). 

Spectrin forms an integral part of the erythrocyte cytoskeletal architec- 
ture; any defects that disrupt the association of the spectrin heterotetra- 
mer or the interaction with any of the other submembranous proteins can 
result in RBC defects (reviewed in Hassoun and Palek, 1996). Indeed, 
abnormalities of the @-spectrin N-terminus and the a-spectrin C-terminus 
affect the self-association site and result in hereditary elliptocytosis and 
hereditary pyropoikilocytosis (Delaunay, 1995; Delaunay and Dhermy, 
1993; Palek and Jarolim, 1993). 

Defects outside the self-association site of spectrin are also associated 
with HS. In many sufferers of HS, both dominant and recessive forms of 
the disease result in spectrin deficiency. Normally a-spectrin is synthesized 
in a three- or fourfold excess of -spectrin (Hanspal and Palek, 1987, 
1992). Heterozygotes for a-spectrin should still be able to produce enough 
normal a-spectrin to associate with the majority of -spectrin present in 
the RBC. The deficiency would only become apparent in sufferers who are 
homozygotes or compound heterozygotes for a-spectrin gene mutations 
where there would be insufficient a-spectrin to associate with (spectrin. 
In the case of G-spectrin deficiency, mutations of the G-spectrin gene are 
associated with dominantly inherited HS because (-spectrin is the limiting 
component in heterotetramer formation (Gallagher and Forget, 1998). 
Redundant unassembled a-spectrin chains are degraded in the lysosomal 
compartment (Woods and Lazarides, 1985). As a result of this, the synthe- 
sis of P-spectrin would seem to be the rate-limiting factor for the assembly 
of the spectrin ap heterodimers on the membrane. Therefore, in contrast 
to a-spectrin, defects of @-spectrin are more likely to be expressed in the 
heterozygous state (Hassoun et al., 1996). The rate of turnover of the 
B-spectrin subunit is viewed as critical to the posttranscriptional regulation 
of the heterodimer assembly on the membrane (Moon and Lazarides, 
1983; Woods and Lazarides, 1985). 

The most common mutations associated with HS are those invol- 
ving ankyrin. However, many mutations have been identified that involve 
spectrin and result in either splicing, missense, deletion, and inser- 
tion mutations (see references within Gallagher and Forget, 1998). Of 
these mutations, two were found in the second CH domain, T182G and 
1220V, with both residues being important for maintaining the hydropho- 
bic core of the CH domain (Becker et al., 1993; Hassoun et al., 1997). 
Alteration of Trp 182 to Gly in the short helix B, in particular, would be 
expected to have a severe effect on the stability of the CH domain with 
consequent effects on the function of the whole ABD. Another isolated 
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G-spectrin point mutation is also known and results in defective binding to 
protein 4.1 (Becker et al., 1993; Wolfe et al., 1982). Further investigation 
suggested that the mutant spectrin was likely to be unstable and suscepti- 
ble to proteolytic degradation, leading to an overall spectrin deficiency 
(Becker et al., 1993). 

Mutations within the rod domain of (-spectrin have also been identi- 
fied. A truncated (-spectrin lacking repeat 12 and part of repeat 13 is the 
product of a large genomic deletion (Hassoun et al., 1995). Studies of the 
mutated spectrin have shown synthesis and stability were unaffected by the 
deletion, but its incorporation into the membrane skeleton was hampered 
in erythroblasts. The misincorporation is likely to be a result of defective 
binding to ankyrin (Hassoun et al., 1995) as the missing section of the rod 
domain is in close proximity to the ankyrin-binding domain (Kennedy 
et al., 1991). A second mutation of -spectrin characterized by truncation 
and overall deficiency was found to involve a point mutation within intron 
17 (Hassoun et al., 1996). This mutation leads to an unstable transcrip- 
tional message that lacks exons 16 and 17. It was proposed that the mutant 
might be susceptible to proteolytic degradation in the cytoplasm of ery- 
throblasts, ultimately contributing to a lack of spectrin at the membrane. 
Several other frameshift, missense, and nonsense mutations of the /(- 
spectrin gene have also been discovered in patients suffering from HS 
(Hassoun et al., 1995). 


C. Familial Focal Segmental Glomerulosclerosis 


In humans, mutations in the ACTN4 gene located on chromosome 
19q13 result in a-actinin-4 mutations, which are believed to cause a form 
of familial focal segmental glomerulosclerosis (FSGS) (Kaplan et al., 2000). 
FSGS is a common nonspecific renal lesion characterized by regions of 
sclerosis in some renal glomeruli, often resulting in loss of kidney function 
and ultimately end-stage renal failure. FSGS is often secondary to other 
disorders such as HIV infection, obesity, hypertension, and diabetes, but 
can also appear as an isolated idiopathic condition. When FSGS occurs as 
a primary process, it is thought to result from a defect in glomerular 
podocyte function (Ichikawa and Fogo, 1996). a-Actinin 4 has been im- 
plicated in some cases of autosomal dominant FSGS due to the high 
expression level of this protein in the glomerular podocyte and also its 
early upregulation during the course of nephritic syndrome in some 
animal models (Drenckhahn and Franke, 1988; Shirato et al., 1996; 
Smoyer et al., 1997). 

Kaplan and colleagues (2000) have sequenced the coding region of 
ACTN4 from a number of families that present one form or another of 
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FSGS. Three specific residues were characterized—K228E, T232I, and 
$235P—and can be found on the solvent-accessible surface in helix G of 
the second CH domain (Kaplan et al., 2000). These mutations are not 
expected to perturb the secondary structure of the actin-binding domain. 
However, they are all highly conserved among all four human isoforms of 
a-actinin, as well as a-actinins of other species. These mutations are not in 
a region of CH2 that is thought to directly interact with F-actin. However, 
their presence has a marked effect on the functionality of the mutant 
protein when in the cellular environment and when binding to F-actin. 
Actin-binding experiments using the mutant a-actinin have revealed that 
more of the mutant protein associates with F-actin than when the wild 
type is used. This observation leads to the conclusion that the ACTN4 
mutations are able to cause or contribute to the susceptibility to focal 
segmental glomerulosclerosis (Kaplan et al., 2000). The increased affinity 
of a-actinin-4 for F-actin was also confirmed by Michaud et al. (2003). 
Further investigations performed by Yao and colleagues (2004) have 
demonstrated that the mutant a-actinins exhibit alter structural character- 
istics, localize abnormally, and are targets for degradation. They suggest 
that the mutant a-actinin-4 is much less dynamic within the cellular 
environment and, due to its propensity to aggregate, loss of normal 
function becomes inevitable and contributes to progression of kidney 
disease. It is possible that the effects of these mutations are more apparent 
in the kidney due to the high level of expression of a-actinin 4 in the 
podocyte (Drenckhahn et al., 1990; Kurihara et al., 1995). These dominant 
mutations may also alter the manner in which a-actinin interacts with 
other cytoskeletal components in conjunction with the effects on the 
normal assembly and disassembly of actin. 


VII. CONCLUSION 


The spectrin superfamily is a group of cytoskeletal proteins that have 
been found to perform a variety of cellular functions. The role of each 
protein and their interactions within the cellular environment stem from 
the specific domains found within each protein and the manner in which 
they are organized. Each of the family members is formed from discrete 
modular domains that have the ability to interact or modulate specific 
interactions or impart physical abilities on the protein relevant to its 
function. The particular members of this protein family have been shown 
to be evolutionary related. a-Actinin is believed to be the ancestor of the 
whole group and, indeed, sequence and phylogenetic analysis has found 
this to be the case. It is astounding that from a simple precursor contain- 
ing few domains such a family of functionally diverse proteins can be 
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generated. It should also be noted that the domains present within 
the proteins of the spectrin superfamily have been adopted by many 
other protein groups, further supporting their suitability to perform the 
interactions that they have evolved. There is still much more to be learned 
from this group of proteins, including not only their structure-function 
relationships, but also the manner in which they fit within the cellular 
environment as a whole. 
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ABSTRACT 


Fibrinogen is a large, complex, fibrous glycoprotein with three pairs of 
polypeptide chains linked together by 29 disulfide bonds. It is 45 nm in 
length, with globular domains at each end and in the middle connected by 
a-helical coiled-coil rods. Both strongly and weakly bound calcium ions 
are important for maintenance of fibrinogen’s structure and functions. 
The fibrinopeptides, which are in the central region, are cleaved by 
thrombin to convert soluble fibrinogen to insoluble fibrin polymer, via 
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intermolecular interactions of the “knobs” exposed by fibrinopeptide 
removal with “‘holes’’ always exposed at the ends of the molecules. Fibrin 
monomers polymerize via these specific and tightly controlled binding 
interactions to make half-staggered oligomers that lengthen into protofi- 
brils. The protofibrils aggregate laterally to make fibers, which then 
branch to yield a three-dimensional network—the fibrin clot—essential 
for hemostasis. X-ray crystallographic structures of portions of fibrinogen 
have provided some details on how these interactions occur. Finally, the 
transglutaminase, Factor XIIIa, covalently binds specific glutamine resi- 
dues in one fibrin molecule to lysine residues in another via isopeptide 
bonds, stabilizing the clot against mechanical, chemical, and proteolytic 
insults. The gene regulation of fibrinogen synthesis and its assembly into 
multichain complexes proceed via a series of well-defined steps. Alternate 
splicing of two of the chains yields common variant molecular isoforms. 
The mechanical properties of clots, which can be quite variable, are 
essential to fibrin’s functions in hemostasis and wound healing. The fibri- 
nolytic system, with the zymogen plasminogen binding to fibrin together 
with tissue-type plasminogen activator to promote activation to the active 
enzyme plasmin, results in digestion of fibrin at specific lysine residues. 
Fibrin(ogen) also specifically binds a variety of other proteins, including 
fibronectin, albumin, thrombospondin, von Willebrand factor, fibulin, 
fibroblast growth factor-2, vascular endothelial growth factor, and interleu- 
kin-1. Studies of naturally occurring dysfibrinogenemias and variant mole- 
cules have increased our understanding of fibrinogen’s functions. 
Fibrinogen binds to activated allbß3 integrin on the platelet surface, 
forming bridges responsible for platelet aggregation in hemostasis, and 
also has important adhesive and inflammatory functions through specific 
interactions with other cells. Fibrinogen-like domains originated early in 
evolution, and it is likely that their specific and tightly controlled intermo- 
lecular interactions are involved in other aspects of cellular function and 
developmental biology. 


I. INTRODUCTION 


Fibrinogen is a fibrous protein that was first classified with keratin, 
myosin, and epidermin based on its 5.1 A repeat in wide-angle X-ray 
diffraction patterns (Bailey ei al., 1943), which was later discovered to be 
associated with the a-helical coiled-coil structure. It is a glycoprotein nor- 
mally present in human blood plasma at a concentration of about 2.5 g/L 
and is essential for hemostasis, wound healing, inflammation, angiogene- 
sis, and other biological functions. It is a soluble macromolecule, but 
forms a clot or insoluble gel on conversion to fibrin by the action of the 
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Fic. 1. Basic scheme of fibrin polymerization and fibrinolysis. The clot is formed on 
the conversion of fibrinogen to fibrin by cleavage of the fibrinopeptides by thrombin, 
followed by stabilization of the network with isopeptide bonds by the transglutaminase 
Factor XIa. The clot is dissolved through proteolysis by the enzyme plasmin, which is 
activated on the fibrin surface by plasminogen activators. This process is controlled by 
several inhibitory reactions (black arrows). 


serine proteolytic enzyme thrombin (Fig. 1), which itself is activated by a 
cascade of enzymatic reactions triggered by injury or a foreign surface. A 
mechanically stable, protease-resistant clot is necessary to prevent blood 
loss and promote wound healing. 

Fibrinogen is also necessary for the aggregation of blood platelets, an 
initial step in hemostasis. Each end of a fibrinogen molecule can bind with 
high affinity to the integrin receptor on activated platelets, alIb{3, so that 
the bifunctional fibrinogen molecules act as bridges to link platelets. In 
its various functions as a clotting and adhesive protein, the fibrinogen 
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molecule is involved in many intermolecular interactions and has specific 
binding sites for several proteins and cells. 

Fibrin clots are dissolved by another series of enzymatic reactions 
termed the fibrinolytic system (Fig. 1). The proenzyme plasminogen is 
activated to plasmin by a specific proteolytic enzyme, typically tissue-type 
plasminogen activator or urokinase-type plasminogen activator. Plasmin 
then cleaves fibrin at certain unique locations to dissolve the clot. The 
activation of the fibrinolytic system is greatly enhanced by taking place on 
the surface of fibrin. Thus, these reactions are highly specific for cleavage 
of the insoluble fibrin clot, rather than circulating fibrinogen. 

There is a dynamic equilibrium between clotting and fibrinolysis, so that 
the conversion of fibrinogen to fibrin and the dissolution of the clot must 
be carefully regulated. Any imbalance can result in either loss of 
blood from hemorrhage or blockage of the flow of blood through a vessel 
from thrombosis. Thrombosis, often accompanying atherosclerosis or 
other pathological processes, is the most common cause of myocardial 
infarction and stroke. As a consequence, various fibrinolytics that activate 
plasminogen are now commonly used clinically to treat these conditions. 

The clinical significance of fibrin was first recognized by Virchow. In 
1859, Deni de Commercy proposed the existence of a precursor of fibrin 
that he called fibrinogen (Blomback, 2001). Although Hammarsten first 
purified fibrinogen from horse plasma in 1876 (Blombäck, 2001), human 
fibrinogen was isolated in large quantities only fifty years ago (Cohn et al., 
1946) and has been extensively studied since then. Fibrinogen and the 
other components of the clotting system are commonly isolated from 
human blood plasma, using purification methods based on fibrinogen’s 
low solubility in various solvents or its isoelectric point (Blombäck and 
Blombäck, 1956; Cohn et al., 1946; Laki, 1951). 

Fibrinogen was last reviewed in this series in 1973 (Doolittle, 1973). 
However, many other reviews have appeared in the meantime, which 
have summarized various aspects of the biology and biochemistry of 
fibrinogen and fibrin (Blombäck, 1991; Budzynski, 1986; Doolittle, 1984; 
Greenberg et al., 2003; Hantgan et al., 2000; Henschen and McDonagh, 
1986; Mosesson et al., 2001; Shafer and Higgins, 1988; Tooney and Weisel, 
1979; Weisel and Cederholm-Williams, 1997). There has been an impres- 
sive amount of research on all aspects of fibrinogen and fibrin in the past 
30 years. This Chapter will provide an outline of some of this body of work 
with selected references to the relevant literature. 

As one indication of the progress, there have been generally accepted 
resolutions of the three fundamental controversies cited by Doolittle 
(1973), although of course not everyone agrees with all of these conclu- 
sions: (1) while there was then "no general accord on the shape of the 
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TABLE I 
Physicochemical Characteristics of Human Fibrinogen 


Molecular mass 340,000 
Molecular volume 3.7 x 10° nm? 
Sedimentation coefficient (Sao,w) 7.8 x 107"? s 
Translational diffusion coefficient (Doo) 1.9 x 1077 
Rotary diffusion coefficient (Oso,.) 40,000 e 
Frictional ratio (f/fo) 2.34 

Partial specific volume 0.72 cm?/g 
Extinction coefficient (Aggo, 1%) 15.1 
Intrinsic viscosity (7) 0.25 dl/g 
Degree of hydration (g/g of protein) 6 

a-helix content 33% 
Isoelectric point 5.5 


fibrinogen molecule,” first electron microscopy and then X-ray crystallog- 
raphy have been used for the determination of the detailed structure of 
fibrinogen; (2) while “the nature of the forces holding subunit portions 
of fibrinogen together” was not known, biochemical and structural studies 
showed the bonds involved; and (3) while “the general location of the 
fibrinopeptides in the parent molecule” was undecided at that time, now 
we know where they are in the molecule, although they have not been 
resolved at atomic resolution. Moreover, we now know a great deal about 
the process of fibrin polymerization, fibrinogen synthesis, and interactions 
with plasma proteins and cells. 


II. STRUCTURE AND PROPERTIES OF FIBRINOGEN 


A. Physico-Chemical Properties, Amino Acid Sequence, 
and Disulfide Bonding 


The physicochemical characteristics of fibrinogen reflect its nature as a 
fibrous protein (Table I). Fibrinogen is a large glycoprotein made up of 
three pairs of polypeptide chains, designated Aa, BG, and y, with molecu- 
lar masses of 66,500, 52,000, and 46,500 Da, respectively (Fig. 2). The 
posttranslational addition of asparagine-linked carbohydrate to the BØ 
and y chains brings the total molecular mass to about 340,000 Da. The 
nomenclature for fibrinogen, (Aa, BØ, zi, arises from the designation of 
the small peptides that are cleaved from fibrinogen by thrombin to yield 
fibrin as fibrinopeptides A and B and the parent chains, without the 


BB-COOH =n 


c= EE BB-COOH 
[| BB-NH, ei 


COOH —_ It lias Aa-NH3 SE BEE Ao-COOH 
CHO + 


Fic. 2. Schematic diagram of the three pairs of polypeptide chains of fibrinogen. The Aa, B2, and y chains are represented by bars 
with lengths proportional to the number of amino acid residues in each chain and the N- and C-terminal ends of the chains are 
labeled. The coiled-coil regions are indicated by the diagonally striped boxes, while the intra- and interchain disulfide bonds are 
indicated by solid lines. Carbohydrate attachment sites are labeled with CHO, while thrombin and major plasmin cleavage sites 
are indicated by T and P, respectively. (Adapted from Fig. 12-1 of Hantgan et al., 2000.) 
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fibrinopeptides, as a and 8. No peptides are cleaved from the y chains by 
thrombin. 

The entire amino acid sequence of all polypeptide chains of human 
fibrinogen has been determined by the methods of protein chemistry 
and later deduced from the nucleotide sequences of the cDNA coding 
for the polypeptide chains (Doolittle, 1984; Henschen and McDonagh, 
1986). There are 610, 461, and 411 amino acids in each of the common 
forms of the human Aa, BØ, and y chains, respectively. The amino acid 
sequences of the three chains are homologous, but important differences 
exist, giving rise to specific functions for certain molecular domains. 

All six chains are held together by 29 disulfide bonds (Henschen and 
McDonagh, 1986) to make two half molecules. There are 8, 11, and 10 
cysteine residues in the Aa, BØ, and y chains, respectively. The amino 
termini of all six chains are held together by disulfide bonds in the central 
domain (Fig. 2). Three interchain disulfide bonds link the two halves of 
the molecule together, one between the two Aa chains and two between 
the two y chains. The A and B fibrinopeptides are also located at the 
N-terminal ends of the a and p chains, respectively. A single interchain 
disulfide bond connects the Aa and BO chains within each half molecule. 
Unusual Cys-Pro-X-X-Cys sequences occurring twice in each chain 
are involved in disulfide ring structures, in which all three chains are 
linked together at each end of a-helical coiled coils (Doolittle, 1984). 
The remainder of the Aa chain contains one intrachain disulfide, while 
the Bß chain contains three intrachain disulfides and the y chain 
contains two. 


B. Domains 


The shape of fibrinogen and its organization into domains, or indepen- 
dently folded units, have been defined by a variety of physicochemical and 
structural techniques. Fibrinogen was one of the first biological macro- 
molecules to be visualized by electron microscopy (Hall and Slayter, 1959). 
Fibrinogen rotary shadowed with platinum was seen to be an elongated 
molecule about 46 nm in length with nodular regions at each end and in 
the middle connected by rodlike strands. Subsequent studies confirmed 
this model, although many conflicting models were proposed in the 
intervening years. Subdivision of the end nodules was demonstrated and 
the aC domains were localized (Fig. 3A; Erickson and Fowler, 1983; 
Mosesson et al., 1981; Weisel et al., 1981, 1985; Williams, 1983). 

The central region contains the two pairs of A and B fibrinopeptides 
and binds thrombin. The distal end nodule is the C-terminal y chain (yC 
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Fic. 3. Transmission electron microscope images of fibrinogen molecules, proto- 
fibrils, and fibers. (A) Fibrinogen molecules rotary shadowed with a thin layer of metal, 
showing the 45-47 nm long elongated protein molecules, with a central domain and 
two nodules at each end, connected by thin rods. Magnification bar, 50 nm. (Image 
from Weisel et al., 1985). (B) Negatively contrasted fibrin protofibrils, showing the half 
staggering of monomers and the twisting of two filaments of the protofibril. In negative 
contrast, protein excludes stain and these areas appear bright. Magnification bar, 
50 nm. (Image from Medved et al., 1990) (C) Negatively contrasted fibrin fibers. Fibers 
have a repeat of 22.5 nm, with a distinctive band pattern that arises because areas with 
high protein density exclude stain and appear bright, while areas with lower protein 
density allow more stain to penetrate and appear darker. Note the trifunctional branch 
points, at which a portion of the fiber diverges from the remainder. The striations of the 
fibrin band pattern are aligned at the branch point. Magnification bar, 200 nm. (Image 
from Weisel, 2004b.) 


nodule) and is made up of three domains, while the proximal end nodule 
is the C-terminal BG chain (GC nodule), also made up of three domains 
(Medved, 1990; Rao et al, 1991; Weisel et al., 1985). The C-terminal 
portions of the two Aq chains (aC domains) extend from the molecular 
ends and interact with each other and with the central domains in 
fibrinogen (Gorkun et al., 1994; Veklich et al., 1993). 

Differential scanning microcalorimetry has been used together with 
limited proteolysis to identify cooperative domains in fibrinogen. It was 
found that each terminal part of the fibrinogen molecule is made up of six 
cooperative domains, with three domains formed by the C-terminal part of 
the @-chain and three other domains formed by the C-terminal part of the 
y-chain (Medved, 1990; Medved et al., 1986, 1997; Privalov and Medved, 
1982). The amino acid sequence of the aC domains suggested that each 
is made up of a globular and an extended portion. Microcalorimetry 
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Fic. 4. X-ray crystallographic structure of fibrinogen. The Aa chains are blue, while 
the B£ chains are green, and the y chains are red. Most of the C-terminal portion of the 
a chains is missing. The extent of fragments D and E, created by plasmin cleavage in the 
middle of the coiled coil, are indicated. The central domain is connected to the end 
regions via a-helical coiled-coil rodlike regions. The OC and yC nodules contain the 
holes that are complementary to the knobs in the central nodule, although the knobs 
uncovered by cleavage of the fibrinopeptides are not seen because they are disordered 
and/or flexible. Carbohydrate attachment sites are indicated by CHO. The atomic 
resolution structure of fibrinogen has been built up from X-ray crystallographic studies 
of human 7C module and fragment D, bovine fragment E, modified bovine fibrinogen, 
and chicken fibrinogen. Where comparisons are available, it seems that there are few 
differences in structures of the fibrinogens from different species (Brown et al., 2000; 
Everse et al., 1998; Madrazo et al., 2001; Yang et al., 2001). (This model was kindly 
generated by Igor Pechik using atomic coordinates PDB ID 1M1]J, which replaced the 
original entry 1JFE (Yang et al., 2001).) 


confirmed this result and showed that the two aC domains interact 
intramolecularly. 

Several proteolytic fragments of fibrinogen have been important for 
defining its domain structure and identifying functional sites (Hantgan 
et al., 2000). The fibrinolytic enzyme plasmin dissolves the fibrin clot, but 
also cleaves fibrinogen at similar locations. Initial digestion removes the 
C-terminal Aa chains and Bf 1-42, creating fragment X. Cleavages in all 
three chains then yield two D fragments and one E fragment (Fig. 4). 
Incomplete cleavage can produce a single D fragment plus a Y fragment 
(D + E). Through a variety of studies, the organization of each of these 
fragments has been defined. Fragment E contains the central region of 
fibrinogen and about half of each coiled coil extending on either side. 
Each fragment D consists of the remainder of one coiled-coil rod and the 
C-terminal portion of the B8 and y chains plus part of the Aa chain. The 
prevalence of studies using these fragments has led to extensive use of 
descriptive terms such as D domain or E domain or similar nomenclature, 
but it is important to note that these fragments are not domains; instead, 
each is made up of multiple domains, so they should be called D regions or 
E regions. 
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C. Coiled Coils 


The central and end regions of fibrinogen are joined together by a- 
helical coiled-coil rods (see Chapter 3). There are 111 or 112 amino acid 
residues from each of the Aa, BG, and y chains that are bordered by 
disulfide rings, as described above. Amino acid sequence predictions 
suggested the presence of coiled-coil structures that would account for 
the a structure first identified in wide-angle X-ray fiber diagrams of 
fibrinogen (Doolittle et al., 1978; Parry, 1978). In the coiled coil, most 
nonpolar residues point inward to form a hydrophobic core, while 
charged residues are exposed to solvent on the outside. Furthermore, 
these predictions suggested that this coiled-coil structure was interrupted 
in the middle in a region known to be susceptible to plasmin cleavage, 
with some other possible interruptions. The X-ray crystallographic struc- 
tures of fragments D and E and of the entire fibrinogen molecule 
(described below) showed the coiled coil and revealed that there is a 
fourth strand for part of the coiled coil, formed from the Aq chain that is 
folded back after the distal disulfide ring. 


D. Carbohydrate 


Four biantennary oligosaccharide chains are linked to each molecule of 
fibrinogen by way of N-glycosyl bonds at BG364 and 752. These carbohy- 
drate attachment sites contain the NXS or NXT sequences that are typical 
of N-glycosylation. On the other hand, the Aa chains are devoid of any 
carbohydrate, despite the fact that two NXS sequences are present. 
It appears that all of fibrinogen’s carbohydrate chains are of biantennary 
structure. Variable desialylation, or removal of the N-acetylneuraminic acid, 
accounts for part of the heterogeneity of circulating fibrinogen. Fibrinogen 
isolated from human plasma contains equal amounts of mono- and 
disialyated chains, but no asialo chains (Townsend et al., 1982, 1984). 

The carbohydrate on fibrinogen has striking consequences for fibrin 
polymerization and clot structure. Patients with cirrhosis of the liver and 
some other liver diseases have fibrinogen with high levels of sialyation of 
their carbohydrate, leading to clots made up of thin fibers with many 
branch points (Martinez et al., 1983). These results are consistent with 
studies using neuraminidase to remove the sialic acid from the carbohy- 
drate of normal fibrinogen, producing clots made up of thicker fibers 
(Dang et al., 1989). Complete removal of carbohydrate has more dramatic 
effects on clot structure, resulting in clots made up of massive fibers 
(Langer et al., 1988). These results suggest that both charge and mass of 
the carbohydrate help to modulate the extent of lateral aggregation, and 
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that the carbohydrate clusters significantly enhance (if not largely account 
for) fibrinogen’s solubility per se. 


E. X-Ray Crystallographic Structure 


Crystallization of large, flexible, heterogeneous proteins like fibrinogen 
is difficult. Bovine fibrinogen was first crystallized after proteolytic re- 
moval of C-terminal portions of the Aq chains with a bacterial enzyme 
possessing unique specificity (Tooney and Cohen, 1972, 1974). Both 
crystals and microcrystals of this modified fibrinogen were examined by 
electron microscopy and analyzed to provide information about fibrino- 
gen structure and the molecular packing in fibrin (Cohen et al., 1983; 
Weisel et al., 1978, 1981). These crystal forms are unusual in that they are 
made up of end-to-end bonded molecules that form flexible filaments. 
The fibrinogen molecule was demonstrated to be about 45 nm in 
length, containing a central region and two globular regions at each 
end connected by rods (Weisel et al, 1985). The distal end domain 
was identified as the C-terminal y chain, while the C-terminal @ chain 
was folded back near the coiled coil. Cryoelectron microscopy combined 
with X-ray crystallography yielded a low-resolution structure of fibrinogen 
(Rao et al., 1991). 

The atomic resolution structure of fibrinogen has been accumulated 
through crystallography of smaller fragments of this protein. The structure 
of the C-terminal portion of the y chain was solved, followed shortly by 
that of fragment D. Subsequently, the structure of a unique fragment E was 
solved, as well as atomic level structures of modified bovine fibrinogen and 
chicken fibrinogen (Fig. 4). 

The 30 kDa globular C-terminal portion of the human y chain was 
expressed in yeast, crystallized, and its structure was determined (Yee 
et al., 1997). Three domains were identified in the structure, including a 
C-terminal fibrin-polymerization domain that contains a single calcium- 
binding site and a deep binding pocket for the complementary knob in 
the middle of the molecule, exposed by cleavage of the A fibrinopeptide. 
The C-terminal residues containing two factor XIIIa crosslinking sites and 
the platelet receptor recognition site were not resolved, probably because 
they are highly flexible and/or disordered. 

The structure of fragment D from human fibrinogen was solved 
and shown to consist of a coiled-coil region and the two homologous 
C-terminal y— and $—nodules oriented at approximately 130 degrees to 
each other (Spraggon et al, 1997). The structure of the Factor XIa 
crosslinked dimer of fragment D (D dimer or double-D) was solved in 
the presence and absence of the peptide ligands that simulate the 


258 WEISEL 


two knobs exposed by the removal of fibrinopeptides A and B, respectively. 
The peptide Gly-Pro-Arg-Pro-amide, an analog of the knob exposed by 
the thrombin-catalyzed removal of fibrinopeptide A, was found to reside 
in the YC holes; the peptide Gly-His-Arg-Pro-amide, which corresponds to 
the -chain knob, was found in the homologous OC holes, demonstrating 
that the @-chain knob can potentially bind to a homologous hole on the 
C-terminal BC nodule (Fig. 4; Everse et al., 1998). The yC and OC holes are 
structurally very similar, but they are able to distinguish between these 
two peptides that differ by a single amino acid. However, in the absence of 
the peptide ligand Gly-Pro-Arg-Pro-am, Gly-His-Arg-Pro-am binds to the 
yC pocket. Additionally, the GC nodule, like its yC counterpart, binds 
calcium. Comparison of the structures revealed a series of conformational 
changes brought about by the various knob-hole interactions (Everse et al., 
1999; Kostelansky et al., 2002). In particular, a movable “flap’’ of two 
negatively charged amino acids (Gluß397 and Asp(398) has side chains 
that are pinned back to the coiled coil with a calcium atom bridge until 
Gly-His-Arg-Pro-am occupies the BC pocket. 

The structure of modified bovine fibrinogen provided information 
about the overall form of the molecule and the chain arrangement 
in the central region where the two halves of the molecule are joined, 
as well as the thrombin-binding site (Brown et al., 2000). However, the 
N-terminal ends of the a— and -chains were not visible, likely be- 
cause they are flexible and/or disordered. The coiled-coil region of 
fibrinogen has a planar, sigmoidal shape. Near the middle of the coiled- 
coil region, at the plasmin-sensitive segment, there is a hinge about which 
the molecule adopts different conformations, which may help accommo- 
date variability in the structure of the fibrin clot. An improved model 
of intermediates in fibrin polymerization was developed (Brown et al., 
2000). 

Since chicken fibrinogen has much smaller aC connectors, it was 
possible to crystallize it so that its structure could be solved (Yang et al., 
2001). The structure included the sigmoidally shaped coiled coil, as well as 
the planar disulfide rings in the middle. Both the aC domains and the 
N-terminal segments of the Aa- and B/-chains, including fibrinopeptides 
A and B, were disordered and/or flexible. The crystal structure of chicken 
fibrinogen complexed with two synthetic peptides was determined. In the 
coiled-coil regions, there is a strong correlation between mobility of 
residues and plasmin attack sites. 

The crystal structure of the central region of bovine fibrinogen, a 
fragment E, provided a view of the central region of the molecule 
and proximal portions of the coiled coil (Madrazo et al., 2001). The two 
halves of the molecule are joined together at the center in an extensive 
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molecular “‘'handshake’’ by using both disulfide linkages and noncovalent 
contacts. On one face of the fragment, the Aa— and Bß-chains from the 
two half molecules form a funnel-shaped domain with a hydrophobic 
cavity. On the opposite face, the N-terminal y chains fold into a separate 
domain. Despite the chemical identity of the two halves of fibrinogen, 
an unusual pair of adjacent disulfide bonds locally constrain the two y 
chains to adopt different conformations. The striking asymmetry of this 
domain may promote the known twisting of the intermediates in fibrin 
polymerization (Weisel et al., 1987). 

The crystal structures of fragment D from several recombinant fibri- 
nogens has demonstrated that these mutant molecules are excellent 
models for the study of structure-function relationships, since the poly- 
peptide chains are folded properly with only local changes (Kostelansky 
et al., 2002). Moreover, these studies allow testing of hypotheses 
about the effects of a particular mutation on structure and internal inter- 
actions within the molecule, which then can be correlated with the effects 
on fibrin polymerization or platelet aggregation (Kostelansky et al., 
2004a,b). 


F. Calcium Binding 


Fibrinogen has both strong and weak binding sites for calcium ions, 
which are important for its structural stability and functions. High-affinity 
binding sites for calcium are present in the y chains, with residues Asp318, 
Asp320, Gly324, Phe322, and two strongly bound water molecules 
providing coordination in sites with amino acid sequence homologous 
to the EF hand calcium binding site in calmodulin (Spraggon et al., 1997; 
Yee et al., 1997). However, the calcium binding loop 7311-336 is not a 
classic EF hand, but a unique structure. The dissociation constant for 
calcium at these binding regions is such that these sites will be fully 
occupied in plasma fibrinogen. Although it was thought that there is a 
high-affinity calcium-binding site in the central region, none has yet been 
visualized. 

Some lower-affinity sites are associated with interactions of the C-termi- 
nal B chain with the coiled-coil backbone (Everse et al., 1999; Kostelansky 
et al., 2002). Other low-affinity binding sites are less well defined. There is 
evidence that some of these sites may be associated with the sialic acid 
residues on the carbohydrate chains (Dang et al., 1989). 

When calcium ions occupy the Y-chain high-affinity sites, the y chains 
are protected from enzymatic degradation (Ly and Godal, 1973; Odrljin 
et al., 1996). The peptide Gly-Pro-Arg-Pro protects the molecule from 
plasmin digestion in a similar manner (Yamazumi and Doolittle, 1992). 
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Calcium binding does not affect fibrinopeptide A cleavage by thrombin or 
batroxobin, but appears to be important for modulating fibrin polymeri- 
zation by enhancing lateral aggregation to form thicker fibers. There is a 
conformational change in fibrin associated with fibrinopeptide B cleavage 
that is calcium-dependent (Donovan and Mihalyi, 1985; Dyr et al., 1989; 
Mihalyi, 1988a). There is a change in affınity of calcium-binding sites 
associated with fibrinopeptide B release and this conformational change. 
Gly-His-Arg-Pro binds tenfold more strongly to fibrinogen in the presence 
of calcium than in its absence. This may be related to the low-affinity 
binding site for calcium in the OC nodule that is involved in the confor- 
mational change associated with binding of Gly-His-Arg-Pro (Everse et al., 
1999). 


IHI. BIOSYNTHESIS AND METABOLISM OF FIBRINOGEN 


Human fibrinogen is the product of three closely linked genes, each 
specifying the primary structure of one of the three polypeptide chains 
(Chung et al., 1983, 1990; Crabtree, 1987). The ability to express human 
fibrinogen in mammalian cells provides a means to study the synthesis and 
secretion of fibrinogen. Site-directed mutagenesis of fibrinogen has estab- 
lished the sequence of steps in fibrinogen assembly. There is a progression 
from single chains to two-chain complexes to trimeric half molecules, 
which then dimerize to form a whole fibrinogen molecule. The fibrinogen 
genes are located in a cluster in the distal third of chromosome 4, bands 
q23-132 (Kant et al., 1985). The gene for the fibrinogen Aa chain, located 
in the middle of the gene cluster, consists of five exons that code for 625 
amino acids, but 15 amino acids are removed posttranslationally. An 
extension of an open reading frame into an alternately spliced sixth intron 
gives rise to a longer Aa chain (aE) in 1-2% of molecules in the adult 
(more in fetal blood), so that they contain an additional domain homolo- 
gous to the C-terminal B and y chains, leading to a fibrinogen molecule 
with a molecular mass of 420 kDa (Fu and Grieninger, 1994). The gene 
for the BØ chain, containing eight exons, is located downstream to that 
of the Aa and y chains and is transcribed in the opposite direction. The y 
chain gene consists of ten exons, but alternative processing of the y chain 
transcript leads to read-through of the exon IX/intron I splice junction, 
which produces 10-15% of plasma fibrinogen molecules with the Y chain, 
containing a 20-amino acid extension at the C-terminus (Chung and 
Davie, 1984; Wolfenstein and Mosesson, 1981). 

Fibrinogen synthesis is stimulated by hormones such as the glucocorti- 
coid dexamethasone or suppressed by the estrogen estadiol-17 (Amrani 
et al, 1983; Wangh et al., 1983). The three fibrinogen genes share a 
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common regulatory mechanism, as indicated by the highly coordinated 
expression of mRNAs, such that in hepatocytes the relative proportions of 
the mRNAs for the three chains are nearly the same (Crabtree, 1987). The 
regulation may involve similar sequence motifs in the immediate 5’ flank- 
ing regions and common transcriptional regulatory molecules (Hu et al., 
1995). In the absence of serum, the in vitro expression of the Aq chain is 
uncoordinated with expression of the B8 and y chains, suggesting that 
signal transduction factors in serum play a part in regulation (Plant and 
Grieninger, 1986). Some of the general and liver-specific transcription 
factors involved in the regulation of fibrinogen expression have been 
identified. 

Fibrinogen is one of the proteins upregulated during the acute phase 
response, with increases two- to tenfold, while albumin levels decrease by 
about 50% (Crabtree, 1987). Under stressful conditions, there are a series 
of reactions that result in cell activation and cytokine production. The 
cytokines then act on other cells, resulting in an increase in production of 
glucocorticoids, proliferation of immune cells, and changes in the levels of 
plasma proteins synthesized in the liver. The upregulation of the acute- 
phase response proteins is mediated by interleukin-6 (IL-6) (Hantgan et al., 
2000). IL-6 response elements are located on all three fibrinogen genes in 
the 5’ flanking regions. Glucocorticoid response regions are present in the 
5’ flanking region of the BG and y genes. 

Fibrinogen is assembled in the rough endoplasmic reticulum of the liver. 
In human hepatocellular carcinoma cells, the newly synthesized B8 chains 
are secreted and used more rapidly than the Aq and y chains, so that 
a large intracellular pool of free Aa and y chains exists (Yu et al., 1984). In 
contrast, the Aa chains are the limiting factor in cultured rat or chicken 
hepatocytes (Hirose et al., 1988; Plant and Grieninger, 1986). Stimulation 
by serum results in a 20-fold increase in secretion of fibrinogen and 
comparable levels of all three chains (Baumhueter et al., 1989). 

The assembly of polypeptide chains into functional fibrinogen mole- 
cules has been studied in several expression systems. The order in which 
the polypeptide chains are joined together by disulfide bonds has been 
determined and specific structural features that are important for assem- 
bly, such as the coiled-coil, have been identified (Xu et al., 1996). Substi- 
tution of the cysteines with serine showed that the interchain disulfide 
ring at the proximal end of the coiled-coil, in addition to the disulfides 
between the two halves of the molecule, are necessary for assembly of the 
two half molecules (Zhang and Redman, 1996). The distal interchain 
disulfide ring is not necessary for assembly of the two half molecules but 
is necessary for secretion. Disruption of intrachain disulfide bonds and 
deletions of portions of the polypeptide chains have revealed which 
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disulfide bonds and noncovalent interactions are necessary for fibrinogen 
assembly. 

The liver is the primary source of plasma fibrinogen, with a steady state 
rate of synthesis of 1.7-5.0 g per day and a large reserve (Takeda, 1966). 
Three-quarters of the human fibrinogen is located in the plasma but it is 
also present in platelets, lymph nodes, and interstitial fluid. The half-life of 
fibrinogen is 3 to 5 days, but the catabolic pathway is different from the 
classic plasma protein pathways and is largely unknown. Coagulation and 
lysis accounts for only 2-3% of fibrinogen loss in healthy individuals 
(Nossel, 1976). Fibrinogen degradation products appear to play a role 
in regulation of the fibrinogen turnover (Nham and Fuller, 1986). 

Platelets contain fibrinogen but there is controversy over its origin, 
and whether it is structurally and functionally distinct from plasma fibrin- 
ogen. The y chain variants of fibrinogen do not appear to be present in 
platelets (Haidaris et al., 1989); in one patient with a heterozygous dysfi- 
brinogen, the platelets contained only normal fibrinogen (Jandrot-Perrus 
et al., 1979). The megakaryocytes, from which platelets originate, are a 
possible source of fibrinogen synthesis. However, in subjects with 
hypofibrinogenemia, there are also lower levels of fibrinogen in platelets; 
infusion of fibrinogen results in a subsequent increase in platelet fibrino- 
gen (Harrison et al., 1989). Also, no fibrinogen mRNA was detected in 
megakaryocytes (Louache et al., 1991). Therefore, it seems that platelet 
and megakaryocyte fibrinogen arise primarily from allb(@lI-mediated 
endocytosis of plasma fibrinogen and its storage in o granules. 

Fibrinogen is also synthesized in several extrahepatic sites, sugges- 
ting that it may function in other structural or adhesive capacities, but 
there is little evidence for this. Fibrinogen y chain gene expression has 
been shown in vivo only in bone marrow, brain, and lung (Haidaris and 
Courtney, 1990). Epithelial cells from lung and intestine secrete small 
amounts of fibrinogen in a polarized manner from their basolateral face 
(Haidaris, 1997). It may be that the lung epithelium secretes fibrinogen 
and incorporates it into the extracellular matrix under certain pathologi- 
cal conditions, contributing to fibrotic lung disease. Synthesis of fibrino- 
gen by cultured granulosa cells may reflect a possible function for it in 
ovulation (Parrott et al., 1993). The apparent in vivo synthesis of fibrino- 
gen by trophoblasts (Galanakis et al., 1996) and the fact that the tropho- 
blast basement membrane consists largely of fibrin(ogen) suggest that 
these cells may secrete fibrinogen into their abluminal and/or intersti- 
tial environment as their need dictates, but the functional significance 
remains an open question. 
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IV. CONVERSION OF FIBRINOGEN TO FIBRIN 


Fibrin polymerization is initiated by the enzymatic cleavage of the 
fibrinopeptides, converting fibrinogen to fibrin monomer (Fig. 1). Then, 
several nonenzymatic reactions yield an orderly sequence of macromolec- 
ular assembly steps. Several other plasma proteins bind specifically to the 
resulting fibrin network. The clot is stabilized by covalent ligation or 
crosslinking of specific amino acids by a transglutaminase, Factor XIIIa. 


A. Thrombin-Catalyzed Cleavage of Fibrinopeptides 


Thrombin, which is produced on proteolytic cleavage of prothrombin 
by Factor Xa in the presence of Factor V and phospholipid, is a serine 
protease with specificity for fibrinogen’s fibrinopeptides at particular Arg- 
Gly bonds. This specificity arises in part because of hydrophobic and 
structure-dependent interactions between the fibrinopeptides and throm- 
bin’s catalytic site, as well as noncatalytic binding of enzyme to substrate 
(Stubbs and Bode, 1993). Distinct regions of thrombin are involved in 
catalysis and fibrin binding. In addition to binding at the catalytic site, 
thrombin also binds to the central region of fibrinogen through ionic 
interactions with positively charged residues on the thrombin B chain, 
termed the anion-binding exosite I. Evidence for this binding site first arose 
from the observation that active thrombin, as well as inactivated forms, 
bind to fibrin. Solution of the structure of thrombin showed the details of 
the catalytic site and the exosites (Bode and Stubbs, 1993). 

Studies of the salt and temperature dependence of fibrinopep- 
tide release show that electrostatic interactions allow thrombin to bind 
at diffusion-controlled rates (De Cristofaro and Di Cera, 1992; Vindigni 
and Di Cera, 1996). After thrombin is bound, hydrophobic interac- 
tions result in a conformational change in this allosteric enzyme, con- 
verting it to a faster form with a higher catalytic efficiency (Guinto et al., 
1995). 

The crystal structure of a complex between thrombin and the E 
fragment corresponding to the central region of fibrinogen shows how 
the two molecules interact (Pechik et al., 2004). The complex consists of 
two thrombin molecules bound to opposite sides of the central part of 
fragment E, such that there is the correct orientation of their catalytic 
triads for cleavage of the fibrinopeptides of fibrinogen. As expected, 
binding occurs through thrombin’s anion-binding exosite I, but only 
part of it is involved in forming an interface with the complementary 
negatively charged surface of fragment E. 
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A fibrin precursor, with one A fibrinopeptide removed and one intact, 
has been isolated (Shainoff et al., 2001). Although fibrinopeptide B release 
begins at the same time as that of the A fibrinopeptides, the experimental 
data are fit by a sequential model in which the B fibrinopeptides are 
released at a significant rate only after A fibrinopeptide cleavage (Lewis 
et al., 1985; Mihalyi, 1988b). With polymerization of fibrin, the rate of B 
fibrinopeptide release increases about sevenfold, which indicates that 
cleavage occurs preferentially from fibrin polymers or that there is a 
polymerization-induced conformational change that enhances release 
(Hanna et al., 1984; Martinelli and Scheraga, 1980). 

Substitution of particular amino acids in the fibrinopeptides and at 
thrombin’s cleavage sites in recombinant fibrinogen has been used to 
understand thrombin specificity, the relative rates of fibrinopeptide cleav- 
age, and effects on polymerization. If fibrinopeptide B is replaced with a 
fibrinopeptide-A-like peptide, fibrinopeptide release is similar to release 
from the Aq chain (Mullin et al., 2000). Thus, the ordered release of 
fibrinopeptides is dictated by the specificity of thrombin for its substrates. 

Thrombin binds strongly to fibrin in the clot or thrombus so that 
thrombin is sequestered and is less likely to be inactivated or prolong 
clotting (Mosesson, 2003). Release and exchange of bound thrombin for 
that in solution, as a result of fibrinolysis or rinsing, is much slower than 
thrombin adsorption. Fibrin has two classes of thrombin binding sites, 
differing in affinity. The high-affinity site has been shown to be on the 7’ 
chain of fibrinogen (Lovely et al., 2003; Meh et al., 2001). Studies of 
dysfibrinogenemias with mutations, such that clots do not bind thrombin 
as strongly, demonstrate the clinical significance of fibrin’s antithrombin 
activity, in that these patients tend to have serious thromboembolic 
disease. 


B. Polymerization Steps 


Although the fibrinopeptides constitute less than 2% of the mass of the 
fibrinogen molecule, their cleavage has profound consequences for the 
solubility of the resulting fibrin. Cleavage of the A fibrinopeptides exposes 
binding sites in the central domain (called A) that are complementary to 
sites (called a) always exposed at the ends of the molecules (Fig. 5) 
(Budzynski, 1986; Doolittle, 1984). The A “knobs” consist in part of the 
newly exposed N-terminus of fibrin’s a chain, Gly-Pro-Arg, but also include 
part of the BG chain. The a sites include ‘“‘holes” in the C-terminal y chain 
(Fig. 4). There are also B ‘‘knobs’’ exposed in the middle of the molecule 
on cleavage of the B fibrinopeptides after polymerization begins; these 
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Fic. 5. Schematic diagram of complementary binding sites or knob-hole interac- 
tions in fibrin polymerization. Fibrinopeptides in the central domain cover knobs that 
are complementary to holes that are always exposed at the ends of the protein. When 
the fibrinopeptides are removed by thrombin, knob-hole interactions occur, giving rise 
to the two-stranded protofibril made up of half-staggered molecules. 


sites are complementary to b “‘holes” in the C-terminal @ chain (Fig. 4) 
(Medved et al., 1993; Shainoff and Dardik, 1983). 

Specific interactions between the A:a complementary binding sites 
produce aggregates in which the fibrin monomers are half-staggered, 
since the central domain of one molecule binds to the end of the adjacent 
molecule (Fig. 5). Initially, a dimer is formed and then additional mole- 
cules are added to give a structure called the two-stranded protofibril (Fig. 3B) 
(Fowler et al., 1981; Medved et al., 1990). 

Once protofibrils reach a sufficient length (usually about 600-800 nm), 
they aggregate laterally to form fibers (Fig. 6) (Hantgan and Hermans, 
1979; Hantgan et al., 1980). The intermolecular interactions that occur in 
lateral aggregation are specific so that the fibers have a repeat of 22.5 nm, 
or about half the molecular length, and a distinctive band pattern as 
observed by electron microscopy of negatively contrasted specimens 
(Fig. 3C) (Cohen et al, 1963; Weisel, 1986a) or X-ray fiber diffraction 
(Stryer et al., 1963). The band pattern directly reflects the molecular 
structure and packing in fibrin and indicates that fibers are paracrystalline 
structures with the molecules precisely aligned in the longitudinal direc- 
tion, but only partly ordered in the lateral direction (Freyssinet et al., 1983; 
Weisel, 1986a; Weisel et al., 1983). 
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Fic. 6. Schematic diagram of fibrin polymerization. Fibrinopeptide A is cleaved from 
fibrinogen, producing desA fibrin monomers, which aggregate via knob-hole interactions 
to make oligomers. Fibrinopeptide B is cleaved primarily from polymeric structures. The 
oligomers elongate to yield protofibrils, which aggregate laterally to make fibers, a process 
enhanced by interactions of the aC domains. Factor XIa crosslinks or ligates y chains 
more rapidly than a chains. Plasmin cleaves the aC domains and Bß1-42 and then cuts 
across the fibrin in the middle of the coiled coil. At the bottom of the diagram, a branch 


point has been initiated by the divergence of two protofibrils. 
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Snake venom enzymes that remove only the A fibrinopeptides have 
been used to show that cleavage of the B fibrinopeptides enhances lateral 
aggregation, producing thicker fibers (Blombäck et al., 1978). Other snake 
venom proteases that preferentially cleave the B fibrinopeptides yield clots 
only at low temperature (Shainoff and Dardik, 1979). Clots formed via 
cleavage of either only fibrinopeptide A or only fibrinopeptide B are made 
up of fibers with normal appearance, indicating that both the Bknobs and 
the A knobs can participate in protofibril formation and neither are 
specifically associated with lateral aggregation (Mosesson et al., 1987; 
Weisel, 1986b). Studies with recombinant mutant fibrinogen have shown 
that an intact a site is necessary for protofibril formation with cleavage of 
the B fibrinopeptides, suggesting that B:a interactions may be involved 
(Hogan et al., 2001). There does not appear to be any clear evidence for 
B:b interactions in fibrin. 

The aC domains enhance lateral aggregation in fibrin polymerization 
and are important for the mechanical properties and stability of clots 
(Fig. 6) (Weisel and Medved, 2001). The aC domains can be seen as 
extensions or appendages of some fibrinogen molecules (Erickson and 
Fowler, 1983; Mosesson et al., 1981; Price et al., 1981; Weisel et al., 1985), 
but normally these regions are interacting with the central portion of the 
molecule (Veklich et al., 1993). On the cleavage of the B fibrinopeptides, 
there is a large-scale conformational change such that the ot domains 
dissociate from the central region and are available for intermolecular 
interactions (Veklich et al., 1993). Experiments with purified or recombi- 
nant preparations missing either one or both of the aC domains, aC 
fragments, and several dysfibrinogenemias indicate that intermolecular 
interactions between aC domains are important for the enhancement of 
lateral aggregation during fibrin polymerization (Gorkun et al., 1994; 
Veklich et al., 1993; Weisel and Medved, 2001). 

Lateral growth of fibers seems to be limited because protofibrils in 
the fibers are twisted (Fig. 7A), so that as the fiber diameter increases, 
they must be stretched to traverse an increasingly greater path length 
(Weisel et al., 1987). With this model, fiber growth stops when the energy 
required to stretch an added protofibril exceeds the energy of bonding. 

The fibrils making up the fibers branch, leading to a three-dimensional 
network (Fig. 7A). This property of branching is essential for the proper- 
ties of fibrin since it leads to the production of a space-filling gel. Such 
a gel can be formed even with very low fibrinogen concentrations 
(<0.01 mg/ml protein), and clots made from 50 mg/ml fibrinogen are 
still 95% liquid. Branching can be initiated by a fibrin monomer that binds 
to one other monomer at its end, and then diverges from that molecule 
to interact with a second monomer in its central region, producing a 
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Fic. 7. Scanning electron microscope images of fibrin clots. (A) Electron 
micrograph of clot formed by addition of thrombin to purified fibrinogen. A clot 
network is formed by branching of fibers, with trifunctional junctions, but in these 
images it is sometimes difficult to distinguish between true branch points and fibers 
crossing at different depths in the clot. Note that the twisting of protofibrils making up 
the fibers is visible on the surface of some fibers. Magnification bar, 5 um. (Image from 
Weisel, 2004b.) (B) Electron micrograph of whole blood clot, made from freshly drawn 
blood with no additions. Fibrin fibers commonly originate from platelet aggregates and 
erythrocytes and leukocytes are trapped in the meshwork. Magnification bar, 10 um. 
(Image from Yuri Veklich and John Weisel, University of Pennsylvania.) 


trimolecular branch point (Mosesson et al, 1993). In addition, other 
branch points are simply points at which two parallel strands diverge from 
each other (Fig. 6) (Baradet et al., 1995; Hantgan and Hermans, 1979). In 
any case, electron microscopy has revealed that branch points are almost 
invariably trifunctional and are generally made up of fibers of similar 
diameter diverging at a small angle (Fig. 3C) (Ryan et al., 1999; Weisel, 
2004b). The physical and mechanical characteristics of clot networks vary 
greatly depending on the conditions of polymerization. 

Interactions between fibrin and fibrinogen are likely to be impor- 
tant during early stages of polymerization, when fibrin monomers are 
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present in small amounts and will exist primarily as soluble complexes with 
fibrinogen (Sasaki et al., 1966). Similarly, such complexes are likely to 
affect polymerization of clots made with low thrombin, such as by addi- 
tion of tissue factor to whole blood in the presence of contact phase 
inhibitors (Rand et al., 1996). In addition, thrombus growth in vivo could 
be affected by fibrinogen or fibrin degradation products by competing for 
polymerization sites. 

Studies of plasma clots made under a variety of conditions suggest 
that the network is established early as a result of the activation process 
(Blombäck et al. 1994) and that the thrombin activation pathway mod- 
ulates clot structure (Torbet, 1995). Light scattering studies, combined 
with other physical chemical techniques, have substantiated this conclu- 
sion for purified fibrinogen and provided more information about the 
structure and properties of the assembling clot (Bernocco et al., 2000; 
Ferri et al., 2001, 2002; Profumo et al., 2003). 


C. Binding Pockets 


X-ray crystallography studies have defined the structure of the a and b 
binding pockets or holes in the C-terminal portions of the y and £ chains, 
respectively (Fig. 4). These binding pockets are always exposed, allowing 
even fibrinogen-fibrin interactions, as just mentioned. The a holes bind 
synthetic Gly-Pro-Arg-Pro peptides, while the b holes preferentially bind 
Gly-His-Arg-Pro peptides. However, depending on the conditions, such 
synthetic peptides may bind to the alternative binding pocket. 

The interactions of Gly-Pro-Arg-Pro with residues in the a pocket define 
minimum features of the binding, but the interaction site may be more 
extensive. Gly-Pro-Arg-Pro is held in the hole mainly by electrostatic inter- 
actions with residues including Gln329, Asp330, His340, and Asp364 
(Spraggon et al., 1997). In contrast to the small structural changes that 
accompany Gly-Pro-Arg-Pro binding to the a hole, a major conformational 
change accompanies binding of Gly-His-Arg-Pro to the bhole (Everse et al., 
1999; Kostelansky et al., 2002). The GC nodule, which is interacting with 
the coiled-coil calcium bridge, moves away and two carboxylate anions 
rotate inward to complete formation of the pocket, with accompanying 
release of calcium. 


D. Other Binding Interactions 


The half-staggered interactions present in fibrin also create a D-D 
contact at the end-to-end junction of two fibrin molecules (Fig. 6). This 
D-D interface is an important feature in most of the crystal forms of 
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fibrinogen and its fragments, so it has been well characterized (Brown et al., 
2000; Spraggon et al., 1997; Weisel et al., 1978). At this interface, yArg275 
makes contacts with both yTyr280 and ySer300; y308Asn and other 
residues are also involved in these interactions. 

There is evidence for the involvement of the N-terminal BO chain in 
fibrin polymerization (Pandya et al., 1985; Siebenlist et al., 1990). A recom- 
binant fibrinogen with His substituted for Arg at the BG thrombin-cleavage 
site led to a 300-fold decrease in the rate of fibrinopeptide B release, 
whereas the rate of fibrinopeptide A release was normal (Moen et al., 
2003). As a consequence, thrombin- or batroxobin-catalyzed or desA 
monomer polymerization was impaired, due to the histidine substitution 
itself. Thus, it appears that the N-terminus of the BO chain is involved in 
the lateral aggregation of normal desA protofibrils. 


E. Covalent Crosslinking of Fibrin by Factor XIa 


The clot is stabilized by the formation of covalent bonds introduced by a 
plasma transglutaminase, Factor XIIa (Greenberg et al., 2003; Henschen 
and McDonagh, 1986; Loewy et al., 2000; Lorand, 2001), which is necessary 
to make fibrin resistant to mechanical and proteolytic insults to prevent 
bleeding. Factor XIII is a 326 kDa tetrameric complex with two A chains 
and two B chains. In addition, about half of the total Factor XIII in blood 
arises from platelets, which contain dimers consisting of only two A chains. 
The three-dimensional structure of the recombinant A chain of human 
Factor XIII was determined by X-ray crystallography, showing that each A 
chain is folded into four domains: catalytic core, 8 sandwich, barrel 1, and 
barrel 2 (Yee et al., 1994). 

The active enzyme, Factor XIIIa, is generated from its precursor, Factor 
XIII, by the action of thrombin in the presence of calcium and fibrin 
(Fig. 1). The 37-amino-acid activation peptide prevents substrates from 
entering the active site pocket containing Cys 314, with the activation 
peptide from one A chain blocking the active site of the other A chain. 
Activation of Factor XIII involves thrombin cleavage of the AgBo zymogen 
form of Factor XIII, followed by calcium-dependent dissociation of the As 
subunits (Greenberg et al., 2003; Lorand, 2001). In plasma, fibrinopeptide 
A is cleaved more than 40 times faster than the Factor XIII activation 
(Greenberg et al., 1985). Factor XIII binds strongly via its B chain to 
fibrinogen, so that nearly all circulating zymogen is bound (Greenberg 
et ol, 2003). The primary site of binding of Factor XIII to fibrinogen 
appears to be the 7’ splice variant (Moaddel et al., 2000a,b; Siebenlist et al., 
1996). This interaction plays a role in maintaining an association between 
these two proteins from before clot formation. 
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Fic. 8. Formation of isopeptide bond catalyzed by Factor XIa. The chemical 
reaction was catalyzed by Factor XIIa, yielding insoluble fibrin crosslinked by Ne-(Y 
glutamyl) lysine bonds. Factor XIII is activated to Factor XIIa by thrombin in the 
presence of calcium ions and fibrin. 


Fibrin polymers are responsible for the fibrin-dependent enhancement 
of Factor XIII activation (Greenberg et al., 2003). The mechanism for this 
effect involves the formation of a tight ternary complex between fibrin, 
Factor XIII, and thrombin, accompanied by a conformational change of 
Factor XIII that exposes the active site, after which Factor XIIa remains 
bound to fibrin. However, the B chains dissociate, which is necessary to 
expose the active site cysteine of plasma Factor XIII. Platelet Factor XIII 
without the B chains, is more rapidly activated by thrombin than plasma 
Factor XIII because of the time that it takes for the B chains to dissociate. 

Many isopeptide bonds can be formed between the side chains of 
chance (donor) and Y-glutamine (acceptor) residues (Fig. 8). The y 
chains of fibrinogen are crosslinked or ligated first, followed by the 
C-terminal o chains. In addition, other proteins—notably a2-antiplasmin, 
plasminogen activator inhibitor-2, and fibronectin—are also covalently 
ligated to fibrin by Factor XIIIa (Greenberg et al., 2003). 

The y chain residues involved in Factor XIIIa-induced crosslinking, 
yG1n398/399 and yLys406, are located near the C-terminal end of the y 
chain (Greenberg et al., 2003). Factor XIIa catalyzes the formation of 
crosslinks between the y chains of fibrin at a much faster rate than it does 
to those of fibrinogen. It has been demonstrated by using a synthetic 
construct with two spaced Gly-Pro-Arg-Pro peptides that this effect is a 
result of physical proximity of the acceptor and donor sites in fibrin 
(Lorand et al., 1998). The crystal structure of crosslinked human D dimer 
does not show the end of the y chains where the crosslinks are present 
because of the flexibility of this part of the molecule, but half of one 
crosslink is present in the structure of lamprey D dimer (Yang et al., 2002). 
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There has been an ongoing controversy on whether the y chains of fibrin 
are crosslinked longitudinally or transversely, and the evidence for both 
viewpoints has been summarized elsewhere (Mosesson, 2004a,b; Weisel, 
2004a,c). 

Each o chain contains potential glutamine acceptor sites at 221, 
237, 328, and 366, and donor sites at lysine 508, 539, 556, 580, and 601 
(Greenberg et al., 2003; Matsuka et al., 1996). Since the aC domains 
associate even in the absence of crosslinking, these interactions probably 
bring acceptor and donor sites in proximity, facilitating the formation of 
the isopeptide bonds. These bonds create a covalently connected network 
of aC domains, although little is known of its structure. In addition, there 
are lesser amounts of y trimers, tetramers, and a-y complexes. Factor XIII 
polymorphisms can have effects on the structure and properties of the 
fibrin clot (Ariéns et al., 2002). 


F. Mechanical Properties of Clots 


The mechanical properties of fibrin are essential for its functions in 
hemostasis and wound healing, since the clot must stop bleeding and 
yet allow the penetration of cells. The mechanical properties of a throm- 
bus will determine how it responds to flowing blood, including elastic or 
plastic deformation and embolization. Epidemiological studies have re- 
vealed a relationship between myocardial infarction and clot stiffness 
(Fatah et al., 1996; Scrutton et al., 1994). 

Many factors that affect clot structure have a great impact on fibrin’s 
mechanical properties. The thrombin concentration can have an effect on 
the clot structure, especially if the rate of generation of fibrin monomer by 
cleavage of fibrinopeptides becomes limiting relative to the rate of poly- 
merization (Blombäck et al., 1994; Weisel and Nagaswami, 1992). In fact, it 
appears that many factors (including pH, calcium and chloride ion con- 
centration, and other plasma proteins) exert an influence on clot struc- 
ture and mechanical properties by affecting the rates of various steps in 
the polymerization process (Carr and Hardin, 1987; Carr et al., 1985, 1986; 
Di Stasio et al., 1998; Weisel and Nagaswami, 1992). 

Fibrin is a viscoelastic polymer, which means that it has both elastic 
and viscous properties (Ferry, 1988). Thus, the properties of fibrin may 
be characterized by stiffness or storage modulus (representing its elastic 
properties) and creep compliance or loss modulus/loss tangent (repre- 
senting its inelastic properties). These parameters will determine how 
the clot responds to the forces applied to it in flowing blood. For example, 
a stiff clot will not deform as much as a less stiff one with applied stress. 
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A clot with a greater inelastic component will deform permanently with 
stress, while one with a greater elastic component will return to its original 
shape. 

The mechanical properties have been measured for many different 
types of clots under various conditions (Ferry, 1988). Both elastic and 
inelastic properties are very sensitive to small changes that affect polymer- 
ization and clot structure. Clots ligated with Factor XIIIa tend to have a 
higher stiffness and a lower inelastic component of deformation. At low 
strains, stress is directly proportional to strain so that the stiffness is 
constant, but at large strains the stiffness of the clot increases 20-fold 
(Janmey et al., 1983). This strain hardening or stiffening may be important 
biologically because it allows fibrin clots to be compliant at low strains and 
then become stiffer at higher strains so that they are not damaged. The 
mechanical properties are greatly affected by inclusion of other proteins 
from plasma and by the effects of platelets and other cells. 

The origin of clot elasticity is unknown, although it cannot be rubber- 
like because the clot has very different structural and mechanical proper- 
ties (Ferry, 1988). Comparisons of the mechanical properties of clots 
made under a variety of conditions correlated with the clot structures 
demonstrate empirical relationships that provide some clues to this long- 
standing puzzle (Ryan et al., 1999). The most likely explanation for the 
inelastic properties of fibrin is the slippage of protofibrils past each other, 
but there may be some role of breakage and reformation of noncovalent 
bonds (Weisel, 2004b). 


V. BINDING OF OTHER PROTEINS TO FIBRIN(OGEN) 


A. Fibronectin and Albumin 


Some proteins, such as plasma fibronectin and albumin, interact with 
fibrin to alter clot structure and properties, although the former becomes 
crosslinked to fibrin while the latter does not. As a result of these and 
other interactions, fibrin clots formed in plasma have very different prop- 
erties than those made with purified proteins (Blombäck et al., 1994; Carr, 
1988; Shah et al., 1987). Albumin has significant effects on the extent of 
lateral aggregation, yielding either thicker or thinner fibers depending on 
its concentration and other experimental conditions (Galanakis et al., 
1987; Torbet, 1986). 

Fibronectin is a large glycoprotein normally present in blood and 
connective tissue that is made up of a number of different domains 
serving as binding sites to various cells via specific integrin receptors 
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and to a number of proteins, including fibrinogen, collagen, and gelatin 
(Ruoslahti, 1988). Fibronectin is important in cell adhesion and may affect 
wound healing. The binding region for fibronectin on fibrinogen has 
been identified (Makogonenko et al., 2002). Fibronectin has distinctive 
effects on clot structure, with fiber size and density affected by the con- 
centration of fibronectin and whether or not it is covalently linked to 
fibrin with Factor XIa. 


B. Thrombospondin, von Willebrand Factor, and Fibulin 


Thrombospondin is a 540 kDa glycoprotein released from the o gran- 
ules of stimulated platelets that has a variety of functions in coagulation, 
cell adhesion, angiogenesis, and inflammation. Thrombospondin appears 
to play a role in platelet aggregation by stabilizing the interactions be- 
tween fibrinogen and platelet integrins that are reversible initially (Bacon- 
Baguley et al., 1990; Leung, 1984). Thrombospondin is also incorporated 
into fibrin clots and affects clot structure and properties. Fibrinogen 
and thrombospondin form a reversible, noncovalent complex. Fragments 
of thrombospondin containing the interchain disulfide bonds retain the 
ability to bind to fibrinogen (Dixit et al., 1984). The corresponding region 
of fibrinogen that binds to thrombospondin appears to reside in either 
the aC domains (Tuszynski et al., 1985) or in the middle of the BG and y 
chains (Bacon-Baguley et al., 1990). 

Von Willebrand factor is a large glycoprotein that is present in blood as 
multimers and is important in initiating the adhesion of platelets to sites 
of vascular damage at the beginning of formation of a platelet plug. Von 
Willebrand factor can bind to fibrin noncovalently and then can be cross- 
linked to fibrin with Factor XIIa (Beguin and Kumar, 1997; Hada et al., 
1986). These interactions may be important in the formation of a platelet 
fibrin thrombus and in anchoring platelets to fibrin in flowing blood. 

Fibulin-1 is an extracellular matrix protein that is present in blood 
and was first isolated in a complex with fibrinogen, to which it binds with 
high affinity via the C-terminal end of the BO chain (Tran et al., 1995). 
Fibulin-1 can support platelet adhesion to extracellular matrix under flow 
conditions via a fibrinogen bridge (Godyna et al., 1996). 


C. Fibroblast Growth Factor-2, Vascular Endothelial Growth 
Factor, and Interleukin-1 
Fibroblast growth factor-2 binds to fibrin specifically at two classes of 
binding sites and has striking effects on endothelial cell prolifera- 
tion (Sahni et al., 1998, 1999). Vascular endothelial growth factor binds 
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specifically to fibrinogen and fibrin also at two classes of sites (Sahni and 
Francis, 2000). Interleukin-1 (but not interleukin-1a) binds to fibrinogen 
and fibrin at the same or similar sites as fibroblast growth factor-2 (Sahni 
et al., 2004). Since fibrin is part of the provisional matrix needed to 
support vascular cell responses for repair, the binding of these growth 
factors and cytokines will localize them to the site of injury so that they can 
enhance cellular responses needed for wound repair and angiogenesis. 


VI. FIBRINOLYSIS 


The clot is meant to be a temporary plug for hemostasis or wound 
healing, so there are natural mechanisms in the body for the efficient 
removal of fibrin. Various proteolytic enzymes and cells can dissolve fibrin 
depending on the circumstances, but the most specific mechanism in- 
volves the fibrinolytic system. The dissolution of fibrin clots under physio- 
logical conditions involves the binding of circulating plasminogen to 
fibrin, and the activation of plasminogen to the active protease, plasmin, 
by tissue-type plasminogen activator (tPA), also bound to fibrin (Fig. 1). 
Highly efficient, specific fibrinolysis requires the ternary complex, fibrin- 
plasmin (ogen)-t-PA; activation of plasminogen is further stimulated by the 
initial cleavage of fibrin (Thorsen, 1992; Wiman and Collen, 1978). Plas- 
min then cleaves fibrin at specific sites, yielding certain identifiable soluble 
fragments, as described above (Fig. 6) (Marder et al., 1969). 

Both plasminogen and t-PA bind to fibrin, but new binding sites are 
created through partial cleavage by plasmin to create new C-terminal 
lysine residues (Suenson et al., 1984). This positive feedback mechanism 
accelerates lysis to ensure efficient degradation of the intact clot. As a 
result during lysis, by addition of t-PA to a clot, there is a region with a very 
high concentration of plasminogen and a very sharp lysis front and a 
narrow lysis zone, where degradation is occurring with movement of the 
fibers being degraded (Collet et al., 2000; Sakharov and Rijken, 1995; 
Sakharov et al., 1996). The lysine residues of partially digested fibrin bind 
to lysine binding sites in certain of the kringle domains of plasminogen 
and t-PA, but there are also lysine-independent binding sites, a finger 
domain in t-PA, and aminohexy] sites in plasminogen. Plasminogen binds 
near the end-to-end junction of two fibrin molecules, which accounts for 
the specificity of its binding to polymerizing fibrin rather than fibrinogen 
(Weisel et al., 1994). y-Chain residues, including 312-324, bind t-PA; 
residues Aa148-160, with a lysine residue at 157, bind both t-PA and 
plasminogen with similar affinities (Nieuwenhuizen, 2001), but since the 
concentration of plasminogen is much higher in plasma, Aa 148-160 serves 
as a plasminogen binding site. Structural studies revealed that Aa154-159 
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is in a solvent-inaccessible region of the coiled coil (Spraggon et al., 
1997), but it may be exposed during fibrin polymerization (Medved and 
Nieuwenhuizen, 2003; Yakovlev et al., 2000). Similarly, there are sites that 
bind plasminogen and t-PA in the aC domain that are exposed in fibrin 
(Tsurupa and Medved, 2001). It has been suggested that t-PA initially 
binds to fibrin via its finger domain, while plasminogen binds via 
its aminohexyl sites, and then both molecules can bind via kringles as 
C-terminal lysines are generated (van Zonneveld et al., 1986). 

Antifibrinolytic compounds can block the conversion of plasminogen 
to plasmin, or directly bind to the active site of plasmin to inhibit fibrino- 
lysis. The plasma protein, a2-macroglobulin, is a primary physiological 
inhibitor of plasmin. Plasmin released from fibrin is also very rapidly 
inactivated by a2-antiplasmin, which plays a role in the regulation of the 
fibrinolytic process (Aoki and Harpel, 1984). a2-antiplasmin inactivates 
plasmin in a very rapid reaction, interferes with plasminogen binding to 
fibrin, and is ligated to fibrin by Factor XIIIa (Sakata and Aoki, 1980). 
After a2-antiplasmin is covalently linked to fibrin’s C-terminal o chain, it 
retains it ability to inhibit plasmin, a function that helps to prevent 
premature clot lysis. 

Fibrinolysis is also inhibited by plasminogen activator inhibitor type 1, 
which is a 50 kDa serine protease inhibitor released from platelets and 
endothelial cells in response to various stimuli. Plasminogen activator 
inhibitor-1 binds to fibrin via high- or low-affinity sites and can inhibit t- 
PA also bound to the clot (Keijer et al., 1991; Stringer and Pannekoek, 
1995). This is an important pathway for limiting fibrinolysis, especially in 
platelet-rich thrombi and regions near endothelial cells. 

Thrombin activable fibrinolysis inhibitor (TAFI) is a plasma protein 
that is activated by thrombin in the presence of thrombomodulin to 
a labile carboxypeptidase-B-like enzyme that inhibits fibrinolysis. When 
TAFIa is included in a clot undergoing lysis induced by tPA and plasmin- 
ogen, the time to achieve lysis is prolonged and free lysine and arginine 
are released (Wang et al., 1998). TAFIa retards the fibrin-enhanced activa- 
tion of plasminogen by tPA and inhibits the accumulation of plasminogen 
at the lysis front (Sakharov et al., 1997). 

Fibrinolysis may also be regulated by lipoprotein(a), or Lp(a), which 
consists of low-density lipoprotein (a lipid core with apolipoprotein B-100 
on the surface) and apolipoprotein(a) (Weisel et al., 2001). Apolipopro- 
tein(a) has great similarity to plasminogen, in that it has an (inactive) 
serine protease domain and a series of many kringles. Individuals have 
different concentrations of Lp(a) and isoforms with different numbers of 
kringles. Like plasminogen, Lp(a) binds tightly to fibrin, initially via the 
aC domains, and new binding sites are generated by partial digestion of 
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the clot (Tsurupa et al., 2003). Lp(a) may compete with plasminogen, 
inhibiting its activation by t-PA and clot lysis (Angles-Cano et al., 1994; 
Edelberg et al., 1990; Harpel et al., 1989; Loscalzo et al., 1990). Other 
studies show that Lp(a) enhances binding of plasminogen and increases 
the rate of t-PA activation (Liu et al., 1994). 


VII. FIBRINOGEN BINDING TO INTEGRINS IN PLATELET 
AGGREGATION AND OTHER CELLULAR INTERACTIONS 


Fibrin(ogen) binds specifically to integrin receptors on platelets, endo- 
thelial cells, and many other cells and plays a vital role in platelet aggrega- 
tion and other aspects of cellular adhesion. Clot formation is a normal 
part of the wound healing process, sealing the injury and preventing 
bleeding. The fibrin clot serves as a scaffold for the migration of various 
cells, but less is known about this process than about earlier stages of 
healing. Fibroblasts migrate through the gel and deposit collagen, while 
macrophages and elements of the fibrinolytic system degrade the clot, 
with fibrin degradation products in turn acting on cells in the matrix. 

Fibrinogen plays an important role in platelet aggregation (Bennett, 
2001). The activity of circulating platelets is tightly controlled to prevent 
the spontaneous formation of platelet aggregates. Circulating platelets 
are inactive until they adhere to exposed subendothelial matrix proteins 
or are stimulated by soluble agonists such as ADP or thrombin. These 
activating events are associated with changes in platelet shape through 
reorganization of the platelet cytoskeleton, secretion of platelet granules, 
and an increase in the affinity of the integrin allbß3 for soluble ligands, 
such as fibrinogen. Fibrinogen binding to platelets is responsible for 
platelet aggregation since it binds specifically to activated alIbG@3, and 
the dimeric fibrinogen molecules bridge adjacent platelets (Fig. 9). The 
presence of platelets also has a dramatic effect on clot structure and 
mechanical properties (Fig. 7B) (Collet et al., 2002). 

Fibrinogen binds to stimulated platelets in a specific, calcium-dependent, 
saturable manner (Bennett, 2001). Residues in the y chain of fibrinogen 
from 400-411 are necessary for binding to platelets. Fibrinogen also has 
two pairs of Arg-Gly-Asp sequences, in the Aa chain in the coiled-coil 
region (Aa 95-97) and in the carboxyl terminus (Aa 572-574). In spite of 
the fact that Arg-Gly-Asp sequences are a common motif for integrin 
binding, they do not appear to be important for platelet aggregation 
(Farrell et al., 1992; Weisel et al., 1992). Purified allbG@3 does not bind 
to the regions containing these sequences, but does bind to the distal ends 
of the fibrinogen (where the C-terminal y chains are located). Further- 
more, recombinant fibrinogen in which these sequences are mutated 
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Fic. 9. Fibrinogen binding to the integrin alIbß3 in platelet aggregation. Diagram 
of fibrinogen-alIb(3 interactions, with fibrinogen in red and ol? in blue. Two 
activated alIbß3 complexes, each with separated tails in the membrane of a different 
platelet, bind to the C-terminal y chain at the ends of a fibrinogen molecule, such that 
the fibrinogen molecule forms a bridge between adjacent platelets. The shape and 
orientation of the proteins in the complex is based on structural studies. (Adapted from 
Weisel et al., 1992.) 


still supports platelet aggregation, whereas modification of the y chain 
sequence eliminates platelet aggregation. 

Studies with transgenic mice lacking the C-terminal Ala-Gly-Asp-Val 
sequence with recombinant human molecules lacking Ala-Gly-Asp- 
Val failed to support platelet aggregation but had normal clot retraction 
(Rooney et al., 1996, 1998). These and other results suggest either that 
different receptors are responsible for platelet aggregation and clot 
retraction or the binding sites may be more complex (Remijn et al., 2001). 

Soluble fibrinogen also functions as a bridging molecule in other cell- 
cell and cell-extracellular matrix interactions in coagulation and inflam- 
mation. In addition, specific binding of fibrinogen to both integrin and 
nonintegrin proteins modulates the functions of many cells in a variety of 
receptor-mediated signal transduction processes. In addition to platelets, 
fibrinogen interacts with endothelial cells, leukocytes, fibroblasts, and 
epithelial cells. 

Endothelial cell adhesion to surface-immobilized fibrinogen results in 
structural changes to the cells, including microfilament reorganization 
and cellular spreading. These interactions appear to be mediated by avß3 
binding to the Arg-Gly-Asp sequence in the C-terminal Aa chain (Cheresh 
et al., 1989; Dejana et al., 1988; Francis et al., 1993; Suehiro et al., 1997). In 
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addition, fibrin missing both A and B fibrinopeptides promotes angiogen- 
esis, which is important during wound healing and tumor growth. Gener- 
ation of fibrin on top of an endothelial cell monolayer causes rapid 
rearrangement of endothelial cells into a capillary network (Martinez 
et al., 2001). In this process, the N-terminal peptide 815-42 is involved in 
binding to VE cadherin on endothelial cells. 

Fibrinogen also binds to the leukocyte integrin aM /2/Mac-l, which is 
important for the inflammatory response and binds multiple ligands, 
and binding of fibrin(ogen) appears to be particularly important for 
leukocyte function. A variety of studies have identified the binding site 
for aMß2/Mac-l as being in the C-terminal Y-module, with 4377-395 and 
other sequences being involved (Lishko et al., 2004; Ugarova et al., 1998, 
2003). To determine the significance of fibrin(ogen)-aMG2 interaction 
in vivo, transgenic mice were generated in which the aMß2 binding 
motif in the fibrinogen y chain (390-396) was mutated, causing loss of 
aMP2-mediated adhesion (Flick et al., 2004). The result was a severely 
compromised inflammatory response, showing that integrin-fibrin (ogen) 
binding is critical to leukocyte function and that fibrinogen is important in 
regulating the inflammatory response. 

Fibrinogen causes certain bacteria, such as Staphylococcus aureus, to 
clump. The C-terminal y chain is necessary for the binding with the 
dimeric fibrinogen molecule forming bridges between the cells (Strong 
et al., 1982). The clumping reactions are mediated by bacterial adhesins 
and are significant for bacterial colonization. 


VIII. DysrFIBRINOGENEMIAS, VARIANTS, AND HETEROGENEITY 
OF FIBRINOGEN 


Dysfibrinogenemias are characterized by structural changes in the 
fibrinogen molecule that result in detectable alterations in clotting or 
other properties of the molecule. Traditionally, they are named after 
the place of their discovery or where the patient lives. Most congenital 
defects are rare but offer the opportunity to study the effects of these 
molecular changes on fibrinogen function. These mutations are listed 
and described in several reviews (Galanakis, 1992, 1993; Matsuda and 
Sugo, 2001; McDonagh, 2000; Roberts et al., 2001). In addition, a data- 
base of all mutations is maintained on the website http://www.geht.org/ 
pages/database_fibro_uk.html. 

Molecular defects that give rise to dysfibrinogenemias are commonly 
caused by single base mutations that lead to the substitution of a single 
amino acid. Other mutations can give rise to a stop codon, resulting in a 
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truncation of one of the chains. Finally, base deletions or additions may 
occur, with consequences for fibrinogen structure and function. Any such 
mutations can cause thrombosis, bleeding, or be asymptomatic, depend- 
ing on their functional effects. They can affect fibrinopeptide cleavage, 
fibrin polymerization, crosslinking, fibrinolysis, or platelet aggregation. 
Since dysfibrinogenemias are discovered in a clinical setting, there are 
nearly always some functional effects. 

In homozygous forms of the dysfibrinogenemias, the mutations occur in 
all fibrinogen molecules, so only abnormal homodimers will be present. 
However, homozygous mutations are very rare, so most dysfibrinogen- 
emias are heterozygous. As a consequence, fibrinogen molecules in most 
subjects consist of various proportions of mutant homodimers, normal 
homodimers, and heterodimers. 

Only a few examples of dysfibrinogenemias will be mentioned here. 
Some of the most common sites of the dysfibrinogens that have been 
identified are in the N-terminal Aa chain, in the regions involved in 
thrombin binding or cleavage or in the A knob. One frequently observed 
mutation is a substitution of Arg16 by His or Cys, with the former causing 
delayed release of fibrinopeptide A, but the latter completely blocking 
its release. If there is no fibrinopeptide A cleavage, fibrinopeptide B is 
released slowly and clots do form, but only at subambient temperatures 
(Shainoff and Dardik, 1979). Mutations of Gly17, Prol8, Arg19, or Val20 
result in defective polymerization because the A site is altered. An Arg554 
to Cys substitution in the Aa chain, with accompanying attachment 
of albumin to this free cysteine, leads to clots made up of very thin, 
slowly digested fibers, which probably accounts for the thrombosis that 
commonly is associated with this dysfibinogenemia (Collet et al., 1993, 
1996). Several Aa chain truncations have been described; clots from these 
mutants are generally made up of very thin fibers, again because of the 
lack of aC domains that normally enhance lateral aggregation. Several 
mutations, including AaArg141 to Ser, create a new consensus sequence 
for glycosylation, with the result that the additional mass and charge affect 
polymerization (Woodhead et al., 1996). 

Mutations of the BG chain are less common. Substitution of Cys for Gly 
at amino acid 15 results in prolonged polymerization from delayed release 
of fibrinopeptide B (Yoshida et al., 1991). Mutation of Ala68 to Thr results 
in defective binding of thrombin to fibrin and consequent thrombosis 
(Koopman et ol, 1992). Deletion of residues 9-72, which includes a 
cysteine that is normally part of a disulfide bond, results in delayed release 
of both fibrinopeptides A and B (Liu et al., 1985). 

In the y chain, many mutations in the C-terminal portion have been 
identified, particularly at position 275 (Cote et al, 1998). This residue 
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makes contacts at the D:D interface, so that mutations affect fibrin poly- 
merization. Several mutations have been identified in residues that are 
part of the calcium binding site in the y chain. Mutations in or near the a 
pocket are common, but patients with most of these substitutions are 
asymptomatic, presumably because they are heterozygous. Finally, other 
mutations in the y chain that affect polymerization may be in regions 
involved in lateral aggregation. In addition, some mutations in any chain 
may cause functional effects because of improper folding of a particular 
part of the polypeptide chain. 

Some point mutations are responsible for congenital hypo- and afibrino- 
genemia, as a result of defects in molecular processing, assembly, secre- 
tion, and domain stability of fibrinogen. Afibrinogenemia is an autosomal 
recessive disorder characterized by the complete absence of detectable 
fibrinogen; analysis of the three fibrinogen genes in affected individuals 
has led to the identification of several causative mutations (Brennan et al., 
2001; Neerman-Arbez, 2001). Most of the cases identified so far were 
found to be caused by defective fibrinogen synthesis as a result of truncat- 
ing mutations in the Aa chain gene. The graded severity of the hypo- and 
afibrinogenemias associated with homozygous Aa chain truncations 
suggest the minimal requirement for molecular assembly is the formation 
of the distal disulfide ring of the coiled coil. Heterozygous muta- 
tions of certain residues that perturb the five-stranded beta sheet of the 
D region are absent from plasma fibrinogen. Other mutations can affect 
intracellular proteolysis, chain assembly, and export. 

As an animal model of afibrinogenemia, to examine the role of fibrinogen 
in hemostasis, wound repair, development, and disease pathogenesis, the 
fibrinogen Aq chain gene was disrupted in mice (Suh et al., 1995). Homozy- 
gous, Aa chain-deficient (Aa—/—) mice were born normal in appearance, 
and there was no evidence of fetal loss of these animals. None of the chains of 
fibrinogen were immunologically detectable in the circulation of adult 
Aa—/— mice, and blood samples did not clot or support platelet aggrega- 
tion in vitro. Although overt bleeding events developed shortly after birth in 
about 30% of Aa—/— mice, most newborns eventually controlled the loss of 
blood. Juveniles and young adult Aa—/— mice were predisposed to sponta- 
neous, fatal abdominal hemorrhage, but long-term survival was variable. 
Pregnancy uniformly resulted in fatal uterine bleeding around the tenth day 
of gestation. These results are consistent with human afibrinogenemic or 
severe hypofibrinogenemic patients, in whom gestation invariably fails dur- 
ing the first trimester. In these patients, gestation is salvaged by fibrinogen 
replacement throughout pregnancy, indicating the absolute need for fibrin- 
ogen and a dynamic fibrin deposition and removal process for normal 
placentation (Galanakis et al., 1996). 
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Additionally, it was found that fibrin(ogen) is important for wound 
healing in cellular migration, organization, and establishment of wound 
stability and strength (Drew et al, 2001). Spontaneous bloodborne 
and lymphatic metastasis, but not primary tumor growth or angiogenesis, 
was diminished in fibrinogen-deficient mice (Palumbo et al., 2002). Plas- 
minogen deficiency in mice results in severe thrombosis, in addition 
to delayed wound healing, spontaneous gastrointestinal ulceration, rectal 
prolapse, and wasting (Bugge et al, 1995; Romer et al, 1996). Mice 
deficient in both plasminogen and fibrinogen demonstrate that removal 
of fibrin(ogen) from the extracellular environment alleviates the diverse 
spontaneous pathologies associated with plasminogen deficiency and 
corrects healing times (Bugge et al., 1996). 

There are also some acquired dysfibrinogenemias from alterations to 
fibrinogen, usually as a result of disease processes. Liver disease (including 
cirrhosis, hepatitis, or hepatic carcinomas) is the most common cause of 
these modifications. In these cases, there is usually an increase in the sialic 
acid content of the fibrinogen’s carbohydrate (Martinez et al., 1983). 

Variants of fibrinogen are also present as a result of several common 
polymorphisms. The most widespread polymorphisms in the fibrinogen 
gene occur in the noncoding regions and can result in changes in plasma 
fibrinogen levels. Two common polymorphisms cause a change in the 
amino acid sequence of fibrinogen (Baumann and Henschen, 1993). In 
the Aa chain, a change from 312 Thr to Ala occurs at a frequency of about 
23%. The other common polymorphism occurs at position 448 in the BZ 
chain with substitution of Arg to Lys, at a frequency of 15%. Associations 
between disease and each of these polymorphisms have been established 
(Ariëns et al., 2002). The Aa312 polymorphism affects clot structure and 
properties (Standeven et al., 2003). 

There are many molecular forms of fibrinogen present in blood, as 
seen from variations in biochemical properties and gel electrophoretic 
behavior. It has been calculated that fibrinogen may occur in more than a 
million nonidentical forms in a healthy individual as a result of the many 
combinations of modified or polymorphic sites (Henschen-Edman, 2001). 
Several of these variations—such as sequence polymorphisms, carbo- 
hydrate content and structure, and splice variants—have already been 
mentioned. In addition, there is variation in phosphorylation at a few 
specific serine sites, proline hydroxylation, tyrosine sulfation, asparagine 
or glutamine deamidation, glutamine cyclization, and methionine oxida- 
tion. The C-terminal Aq chain is particularly susceptible to digestion by 
proteolytic enzymes, but some digestion also commonly occurs at specific 
sites in the 2 and y chains, and consequently lower molecular weight 
forms are nearly always present in plasma fibrinogen. Abnormal variants 
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occur in patients with several conditions, including glycation of lysine 
residues in diabetes mellitus, crosslinked degradation products in 
disseminated intravascular coagulation and some types of cancer, and 
antibody complexes in certain autoimmune diseases. Tobacco smokers 
tend to have nitration of tyrosine and oxidation of methionine, histidine, 
or tryptophane residues, and individuals treated with acetylsalicylic acid 
have acetylated lysine residues. All of these polymorphic forms may show 
considerable differences in their functional properties. 


IX. EVOLUTION OF FIBRINOGEN AND FIBRINOGEN-LIKE DOMAINS 


The thrombin-catalyzed clotting of fibrinogen to form a fibrin gel is 
common to all extant vertebrates. Because clots are necessary to stop 
bleeding, but can also cause thrombosis if not dissolved in a timely 
fashion, an effective scheme for fibrinolysis evolved concomitantly. The 
amino acid sequences of the Aa, BØ, and y chains are homologous, 
indicating that they have evolved from a common ancestor (Doolittle 
et al., 1997; Henschen et al., 1982). This homology even extends to the 
genomic sequences, with two intron/exon boundaries being conserved 
in all three chains; another one is common to both BO and y chains 
(Medved, 1990). Taken together, all of this evidence suggests that the 
evolution of the three chains from a common ancestor through a series 
of duplications and inversions began about a billion years ago. It has 
been suggested that an ancestral gene duplicated to produce the a chain 
gene and a pre fa gene (Doolittle et al., 1997; Henschen et al., 1982). 
The pre G—y gene then duplicated about 500 million years ago to form 
the @ and y genes. Zebrafish have all major hemostatic proteins, allowing 
some evolutionary predictions (Hanumanthaiah et al., 2002) and throm- 
bin has been used as a probe to investigate fibrinogen to fibrin conversion 
at different stages of embryonic development (Jagadeeswaran and Liu, 
1997). 

Amino acid sequence comparisons imply that fibrinogen evolved 
before the divergence of vertebrates and invertebrates, so it is reasonable 
to suppose that a protein with domains similar to those of fibrinogen 
might exist in invertebrates. A fibrinogen-like sequence was identified in 
cDNA prepared from the soft tissues of a sea cucumber, Parastichopus 
parvimensis (Xu and Doolittle, 1990). Two proteins corresponding to the 
cloned mRNAs, FReP-A and FReP-B, have sequences with similarity to 
the C-terminal two-thirds of vertebrate fibrinogen Bf and y chains. Com- 
parisons of various fibrinogen-related sequences suggest that the sea 
cucumber proteins diverged before the Go gene duplication. 
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Nucleotide sequences of fibrinogen-like proteins have been identified 
in a number of other genes, including the tenascin family of extracellular 
matrix proteins (Erickson, 1994; Jones et al., 1988), fibroleukin (Koyama 
et al., 1987), anda protein encoding scabrous of the developing Drosophila eye 
(Baker et al., 1990; Lee et al., 1998). The existence of these fibrinogen- 
like domains suggests that the specific and tightly-controlled inter- 
molecular interactions that occur in fibrin polymerization and in plate- 
let aggregation may be used in other aspects of cellular function and 
developmental biology. 


X. CONCLUSIONS 


There is now much structural information about fibrinogen, inclu- 
ding its primary sequence, the chain connectivity via disulfide bonds, 
and organization into domains. There are X-ray crystallographic struc- 
tures of large parts of the molecule, but some key regions have not yet 
been visualized. Some aspects of calcium binding are beginning to be 
discovered, but less is known of the low affinity sites. 

Tremendous strides have been taken in understanding the metabolism 
and biosynthesis of fibrinogen, but there is still much to be done. Almost 
nothing is known of the catabolism of fibrinogen. The functions of the 
alternately spliced forms and the many posttranslational modifications are 
only partly known. Many dysfibrinogenemias have been discovered, but 
most have only been relatively superficially characterized. 

This is an exciting time for studies of fibrin polymerization. The basics 
have been outlined and it seems that we may be tantalizingly close to 
achieving a real understanding of the molecular mechanisms. Of the 
complementary binding sites, the holes have been seen but not the knobs. 
The holes with specific peptides have also been visualized, but not the 
protein-protein interactions in fibrin. Little is known of either the B or b 
interactions, or of the function of the GC domain. Structural changes 
during polymerization are only beginning to be discovered. Binding 
interactions in lateral aggregation are unknown. 

The mechanical properties of many types of clots have been measured, 
but the origin of clot viscoelasticity is a mystery. Factor XIIIa-induced 
crosslinking sites have been identified in the primary sequence, but the 
structure of the crosslinked C-terminal y chains is a matter of debate and 
that of the a chains is unknown. The biochemistry of fibrinolysis has been 
determined, but less is known of the physical mechanisms involved or of 
the interactions of regulatory systems. 

Binding interactions of several plasma proteins to fibrin(ogen) have 
been identified, but the functional consequences of most of them are not 


FIBRINOGEN AND FIBRIN 285 


well characterized. The interactions of fibrinogen with integrins on cells is 
an extremely active field of study that was only touched upon briefly in this 
review. Fibrinogen binding to platelets continues to be a model system for 
studies of cell adhesion. 

The involvement of fibrinogen in biological functions beyond hemosta- 
sis and thrombosis is a wide open field. Nearly all research has been 
carried out with human fibrinogen, with isolated studies of a few other 
mammals, frog, chicken, zebrafish, and lamprey. There have been few 
comparative and evolutionary studies. 
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ABSTRACT 


The molecular conformation of the collagen triple helix confers strict 
amino acid sequence constraints, requiring a (Gly-X-Y)„ repeating pattern 
and a high content of imino acids. The increasing family of collagens and 
proteins with collagenous domains shows the collagen triple helix to be a 
basic motif adaptable to a range of proteins and functions. Its rodlike 
domain has the potential for various modes of self-association and the 
capacity to bind receptors, other proteins, GAGs, and nucleic acids. High- 
resolution crystal structures obtained for collagen model peptides confirm 
the supercoiled triple helix conformation, and provide new information 
on hydrogen bonding patterns, hydration, sidechain interactions, and 
ligand binding. For several peptides, the helix twist was found to be 
sequence dependent, and such variation in helix twist may serve as re- 
cognition features or to orient the triple helix for binding. Mutations in 
the collagen triple-helix domain lead to a variety of human disorders. The 
most common mutations are single-base substitutions that lead to the 
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replacement of one Gly residue, breaking the Gly-X-Y repeating pattern. 
A single Gly substitution destabilizes the triple helix through a local 
disruption in hydrogen bonding and produces a discontinuity in the 
register of the helix. Molecular information about the collagen triple helix 
and the effect of mutations will lead to a better understanding of function 
and pathology. 


I. INTRODUCTION 


Collagen has always been considered to be an important protein be- 
cause of its abundance in the human body and its commercial applica- 
tions. Fifty years ago, the molecular conformation for collagen was 
proposed to be a novel triple-helix structure. Fiber diffraction analysis 
and model building, together with early amino acid composition and 
sequence data, led to the concept of three chains, each in a polyproline- 
I-like conformation, supercoiled about a common axis (Ramachandran, 
1967; Ramachandran and Kartha, 1955; Rich and Crick, 1955, 1961). The 
close packing of the three chains near the common axis places steric 
constraints on every third position, such that only glycine can be accom- 
modated without chain distortion. This generates the (Gly-X-Y) n repeating 
sequence, recognized as the signature of a collagen. Residues in the X and 
Y positions are largely solvent accessible and can accommodate any side- 
chain. In fact, imino acids are particularly favorable because their fixed a 
angle and restricted y angle are close to those found in the triple helix. 
When Pro is incorporated into the Y position in a collagen chain, it 
becomes posttranslationally modified to hydroxyproline (Hyp), which 
has a highly stabilizing effect on the triple helix. 

Early on, it was realized that peptides could be useful models for the 
features of collagen. Polyglycine and polyproline played an important role 
in elucidating the triple-helix structure (Cowan et al, 1955; Rich and 
Crick, 1955), and polytripeptides were designed to adopt the collagen con- 
formation. Initially, the peptides were heterogeneous polymers, but with 
the advent of solid state peptide synthesis, peptides of defined length and 
sequence could be synthesized to clarify principles of triple-helix stability 
and mimic biological activity (Fields and Prockop, 1996; Goodman et al., 
1998; Heidemann and Roth, 1982; Jenkins and Raines, 2002). Most pep- 
tides are studied as monomers that self-associate into trimers, while, in 
some cases, crosslinked trimers are synthesized to facilitate trimer forma- 
tion, stabilize the triple helix, or create heterotrimers. Triple-helical pep- 
tides are proving more amenable to structural characterization than 
the longer native collagen molecules, and peptide investigations have 
led to clarification of the molecular details of collagen structure by 
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X-ray crystallography, nuclear magnetic resonance (NMR), and other 
biophysical techniques. 

Because of its strict structural constraints, triple-helix domains can be 
identified on the basis of amino acid sequence. With many amino acid 
sequences of proteins available through molecular biology, there is clearly 
a growing family of collagen proteins with a common triple-helix motif 
present in diverse tissue structures and an expanding collection of colla- 
gen-like domains in various other proteins. The collagen triple helix can 
form a straight or kinked rod, has the capacity for self-association into 
various supramolecular structures, and has the potential to bind ligands 
and receptors. These structural features of collagen have adapted to a wide 
range of biological roles, and now the triple-helix motif is more appropri- 
ately viewed in a general context as a basic and versatile protein motif. 
Additionally, a growing number of mutations in collagens and collagen-like 
domains have been associated with specific diseases. The need to under- 
stand the effects of these mutations has lent a growing impetus to character- 
ization of the normal and variant forms of molecular conformation and 
higher order structure. 

This review will focus on recent advances in the molecular structure of 
the collagen triple helix, including information from high-resolution 
structures, the effect of amino acid sequence on stability, ligand binding, 
and disease. Important articles on the molecular properties of collagen 
were published in earlier volumes of this series (Harrington and von 
Hippel, 1961; Privalov, 1982; Traub and Piez, 1971). There are excellent 
reviews of earlier work on the molecular structure of collagen (Fraser and 
MacRae, 1973; Fraser et al., 1987; Ramachandran, 1967) and several 
comprehensive overviews of collagens, their structures, assembly, and 
biochemistry (Kielty and Grant, 2002; Myllyharju and Kivirikko, 2004). 


II. THE BIOLOGICAL ROLE AND OCCURRENCE OF THE COLLAGEN 
TRIPLE-HELIX MOTIF IN PROTEINS 


Collagens are defined as structural molecules in the extracellular matrix 
that contain a triple-helix domain; at this time, 27 distinct types of 
human collagens have been identified (Fig. 1; Table I; Kielty and Grant, 
2002; Myllyharju and Kivirikko, 2004). These different collagen types carry 
out specialized functions in diverse tissues and have distinctive modes of 
supramolecular organization. Some molecules are homotrimers, while 
others are heterotrimers, with two or three distinguishable chain types. 
The most abundant of these are found in characteristic collagen fib- 
rils (major types I, II, III, and minor types V and XI), which form the 
structural basis of skin, tendon, bone, cartilage, and other tissues. Other 


304 BRODSKY AND PERSIKOV 


Type VII collagen 


Fibrillar Collagens © 

E = g 
© 

SESS SASSI 9 £ 

SE EE 3 2 

LLL DN £ 5 

= = 

a) SN 

wW Anchoring Fibril ZS 


ES 
EE EE S 


membrane 


KGPT)5GDTGTT]n 


Fic. 1. Illustration of some of the biological forms of the collagen triple-helix 
domains. 


collagens are found on the surface of fibrils (FACIT types IX, XII, XIV, 
XVI, XIX, XX, XXI, XXII, XVI); within basement membrane net- 
works (type IV collagen); in hexagonal networks (types VIII and X); as 
beaded filaments (type VI); in the anchoring fibrils of skin (type VII); or as 
membrane proteins (types XIII, XVII, XXIII, XXV). Type XVIII collagen 
is a basement membrane heparan sulfate proteoglycan, and contains a 
C-terminal noncollagenous fragment, endostatin, which inhibits angio- 
genesis and tumor growth (Kawashima et al., 2003). A thorough discussion 
of the various collagen types, their structures, and their functions is 
included in several excellent reviews (Kielty and Grant, 2002; Myllyharju 
and Kivirikko, 2004). 

In addition to being the defining feature of collagens, the collagen 
triple helix is present as a motif in a variety of proteins, many of which are 
involved in host-defense functions (Fig. 1; Table I; Kielty and Grant, 2002; 
Lu et al., 2002; Matsuzawa et al., 2004; Myllyharju and Kivirikko, 2004; 


TABLE I 
Occurrence of Proteins with Collagen Triple-Helix Domains 


Proteins Supramol Tissue Number of (GXY), Breaks 
Vertebrates 
Type I collagen D-periodic fibril Tendon, bone, skin 338 0 
Type II collagen D-periodic fibril Cartilage, vitreous 338 0 
Type IV collagen Network Basement membrane 437 21 
Type VII collagen Antiparallel Anchoring fibrils 472 20 
Type IX collagen Surface of fibrils Ubiquitous 85 4 
Type XII collagen (FACIT) 197 5 
Clq Hexamer Serum, complement 24 1 
MBP Oligomer Serum 19 1 
SP-A Hexamer Lung surfactant 23 1 
SP-D Tetramer Lung surfactant 59 0 
MSR Membrane Macrophage 24 0 
Invertebrates 
C. elegans cuticle collagen Exoskeleton ~9, ~40 1-4 
Drosophila type IV collagen Network Basement membrane 444 17 
Hydra minicollagen Disulfide linked polymers Nematocyst inner wall 12-16 0 
Bacteria and Viruses (120 types) 
Sch) Cell surface Streptococcus group A 50 0 
BclA Exosporium filament Bacillus anthracis 70 0 
Lymphocytis disease virus Unknown Unknown 55 1 
Shrimp white spot virus Unknown Unknown 389 1 
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Shirai et al., 1999). The kinked triple-helix domain in Clq binds to the 
serine proteases Clr and Cls, and mediates the self-association of six 
trimer molecules that generates the bouquet-type structure important 
for activity. The family of collectins, which includes the mannose-binding 
lectin SP-A and SP-D, all contain a collagenous domain, a coiled coil 
a-helical domain, and a terminal carbohydrate recognition domain. 
Ficolins have a triple-helix domain with a terminal carbohydrate binding 
fibringen domain. Both collectins and ficolins are host-defense molecules, 
which bind to the carbohydrate groups of microbes, leading to comple- 
ment activation and phagocytosis (Lu et al., 2002). The triple-helix domain 
of the macrophage scavenger receptor is responsible for ligand recogni- 
tion (Doi et al., 1993; Shirai et al., 1999), while the collagenous tail of the 
asymmetric form of acetylcholinesterase binds to heparan sulfate, localiz- 
ing the enzyme to the neuromuscular junction (DePrez et al., 2000). In 
addition to these examples, numerous other proteins have been observed 
to contain collagen-like domains, some of which are listed in Table I. 

Orthologues of fibril-forming collagens and of type IV collagen are 
found in a number of invertebrates, in addition to specialized collagens 
such as the cuticle collagens of C. elegans and the hydra nematocyst 
minicollagens (Exposito et al., 2002). It was thought that collagen was a 
defining feature of multicellular animals, but recent observations show 
triple-helix domains present in bacteria and viruses. Collagen-like domains 
Scll and Scl2 were observed in proteins expressed on the cell surface of 
group A streptococcus, and were shown to adopt a triple-helix conforma- 
tion (Xu et al, 2002). A highly repetitive collagen-like domain was also 
identified in the filament protein of the exosporium of anthrax spores, 
and the length of this collagenous domain appears to determine the 
filament length (Sylvestre et al., 2003). A search of the genomes of various 
bacteria and viruses indicated the presence of at least 100 novel proteins 
containing collagen-related structural motifs (Rasmussen et al., 2003). 
These findings confirm that the collagen triple-helix motif can be part 
of many different kinds of proteins and can fill a wider than expected set 
of biological niches (Table I). 

The occurrence of the collagen triple helix illustrates its role in protein 
function (Fig. 1). In all cases, the triple helix forms a rodlike structure, but 
this rod can have a kink, as in Clq and MBL, or have flexible interrup- 
tions, as in type IV collagen. In most cases, the collagen triple helix 
self-associates to form a higher-order structure. Many possible modes of 
association are seen, including staggered parallel (fibrils), parallel unstag- 
gered (stalk of MBP, SP-A and Clq), antiparallel (type VII and type VI 
collagens) and other more complex relationships in networks and fibers. 
A number of collagens and proteins with collagenous domains contain 
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transmembrane domains (MSR, and collagen types XIII, XVII, XXIII, and 
XXV). Fibrillar collagen structure and network-forming collagen structures 
are the focus of Chapters 10 and 11. The triple helix is an important ligand 
binding domain in most proteins, binding to receptors, proteases, extracel- 
lular matrix proteins, GAGs, nucleic acids, and lipids in different cases (see 
section VII below). In contrast to early thinking, it is increasingly clear that 
the collagen triple helix is not a domain responsible for trimerization. The 
triple helix folds very slowly, and is usually dependent on a neighboring 
coiled coil or globular domain for nucleation (McAlinden et al., 2003). But 
once trimerized, its rodlike structure has the potential for a wide range of 
modes of self-association and the ability to bind diverse ligands, as seen in 
the growing collection of proteins with triple-helix domains. 


II. MOLECULAR STRUCTURE: COLLAGEN AND COLLAGEN 
MODEL PEPTIDES 


The advances in fiber diffraction analysis, amino acid composition data, 
molecular modeling, and polypeptide structures all converged during the 
mid-1950s to clarify the unique structure of collagen. The supercoiled triple- 
helix conformation was proposed for collagen in 1955 independently 
by Ramachandran, who proposed a similar model without supercoiling 
in 1954, (Ramachandran and Kartha, 1954); by Rich and Crick (1955); 
and by Cowan, McGavin, and North (Cowan et al., 1955). In this con- 
formation, three polypeptide chains, each in an extended left-handed 
polyproline H-helix conformation, are supercoiled in a right-handed man- 
ner around a common axis, with a stagger of one residue between adjacent 
chains. The most accurate model available for collagen is based on linked- 
atom least-squares refinement of the Rich and Crick II model using the 
excellent fiber diffraction data from highly stretched partially dehydrated 
kangaroo tail tendon, as reported by Fraser et al. (1979). The basic 
conformation was confirmed and complemented by a range of spectro- 
scopic and hydrodynamic studies on collagens from various species. The 
historical developments and collagen investigations are reviewed in Fraser 
and MacRae (1973). 

Although there were early reports of crystallization of a cyanogen bromide 
fragment of collagen (Yonath and Traub, 1975) and important NMR stu- 
dies were carried out on isotopically labeled collagen in tissues (Batchelder 
et al., 1982; Sarkar et al., 1983, 1987), the collagen molecule itself has not 
proved amenable to investigations at the molecular level. The path to the 
molecular details of the collagen triple helix has been through collagen 
model peptides, which have yielded high-resolution X-ray structures and 
allowed NMR characterization of dynamic and conformational features. At 
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this time, the crystal structures of nine different triple-helical peptides have 
been reported, one with a bound integrin domain (Table II; Fig. 2). The 
fiber diffraction models of collagen left unresolved controversies about the 
nature of interchain hydrogen bonding, including the possibility of hydro- 
gen bonds involving alpha carbon atoms, and the precise geometrical 
parameters of both the basic helix and the supercoil. The high-resolution 
crystal structures of peptides confirmed the conformation derived from 
fiber diffraction data of collagen, and resolved a number of long-standing 
controversies about hydrogen bonding and hydration. The molecular 
details also raise new issues for consideration, including variability in helix 
twist and the mechanism of hydroxyproline stabilization. 


A. Sequence Dependence of Triple Helix Twist 


Fiber diffraction patterns of collagen in tail tendon are indexed with a 
10/3 symmetry (10 units in 3 turns) (Fraser et al., 1979; Rich and Crick, 
1961). Therefore, it was surprising when Okuyama ei al. (1981) reported 
that the first crystal structure of a collagen-like peptide, (Pro-Pro-Gly)ıo, 
had a 7/2 symmetry (7 units in 2 turns). The difference in triple-helix 
symmetry between the (Pro-Pro-Gly)j9 crystal and collagen fibers could 
have arisen from crystal packing effects, the unusually high imino acid 
content in the peptide compared with collagen, or from the absence of 
Hyp in the peptide. Recent crystallographic structures have confirmed the 
7/2 symmetry is present in (Gly-Pro-Hyp)ıo and the G—A peptide, as well 
as (Pro-Pro-Gly)ıo (Table II). The EKG peptide, which is homologous to 
(Pro-Hyp-Gly)ıo but with one Glu-Lys-Gly triplet in the center, also shows 
strict 7/2 symmetry (Kramer et al., 2000). 

However, a nonuniform helical twist has been observed in the crystal 
structures of peptides T3-785 and IBP (Emsley et al., 2000, 2004; Kramer 
et al., 1999, 2001). These two peptides each can be viewed as having three 
zones: N-terminal Gly-Pro-Hyp repeats; a central collagen sequence; and 
C-terminal Gly-Pro-Hyp repeats (Fig. 3). In T3-785, the two terminal 
Gly-Pro-Hyp regions show 7/2 symmetry, while the central Gly-Ie-Thr- 
Gly-Ala-Arg-Gly-Leu-Ala region is closer to 10/3 symmetry. In IBP, the 
terminal Gly-Pro-Hyp repeats have 7/2 symmetry, and the central Gly- 
Phe-Hyp-Gly-Glu-Arg sequence is intermediate between 7/2 and 10/3. 
In each case, these three zones are slightly bent and twisted with respect 
to each other. It appears that the 7/2 symmetry is generated by the steric 
restrictions of repeating Gly-imino acid-imino acid units and is maintained 
when only one Gly-X-Y triplet is introduced (e.g., EKG peptide). The 
presence of 2 or 3 tripeptide units, where X and Y are not imino acids, 
starts to change the helix twist towards 10/3 symmetry. 


TABLE I 


Collagen-Like Peptides with High-Resolution Crystal Structures Entered in the Protein Data Bank 


Number of Helical 

Peptide Sequence residues PDB ID Resolution (A) twist Reference 
G->A (POG) ,4 POA (POG); 30 1CAG 1.85 7/2 Bella et al., 1994 
T3-785 (POG) sITGARGLAG (POG) 4 30 1BKV 2.00 Variable Kramer et al., 1999 
EKG (POG) 4 EKG (POG); 30 1QSU 1.75 7/2 Kramer et al., 2000 
Hyp- (POG),PG (POG); 29 1EI8 2.00 7/2 Liu, 2000 
PPG10 (PPG) 10 30 1K6F 1.30 7/2 Berisio et al., 2002 
GPP-foldon (GPP) 9-foldon 27 1NAY 2.60 7/2 Stetefeld et al., 2003 
PPG9 (PPG)o 27 1ITT 1.00 7/2 Hongo et al., unpub. 
POGIO (POG) 40 30 1V4F 1.26 7/2 Okuyama et al., unpub. 
Integrin binding (GPO). GFOGER (GPO); 21 1Q7D 2.10 Variable Emsley et al., 2004 

peptide 
In complex with integrin 21 1DZI 1.80 Variable Emsley et al. 2000 
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Fic. 2. Molecular structures of peptide T3-785 showing hydrogen bonding, sidechain orientation, and hydration. 


OTS 


AOMISAAd ANV AYSGOU 


COLLAGEN MOLECULAR STRUCTURE 311 


POGPOGPOGIITGARGLAGIPOGPOGPOGPOG 


~7/2 symmetry -10/3 symmetry ~7/2 symmetry 


Fic. 3. Variation in helix twist seen in the three domains of peptide T3-785. 


These peptide results suggest the collagen molecule, with its varied Gly- 
X-Y sequence, is likely to have a nonuniform helical twist along its length. 
It is likely sequences poor in imino acids will have a symmetry close to 
10/3, while stretches of Gly-Pro-Hyp units may have 7/2 symmetry. In type 
I collagen, the longest sequence of repeating Gly-Pro-Hyp triplets is found 
at the C-terminus, and 7/2 symmetry could play a role in its functioning as 
a nucleation domain (Xu et al., 2003). It is not clear at this time whether 
collagen chains have a continuous variation in superhelix twist, or do- 
mains of different symmetry connected with kinks and bends. The X-ray 
pattern of tail tendons does show a predominance of the 10/3 symmetry, 
but coherent regions may be overrepresented in fiber diffraction patterns. 
The conformational differences between the 7-fold and 10-fold symmetric 
triple-helices are subtle, but the small difference in units per turn (3.50 vs. 
3.33) would translate into a significant difference in the helix repeat over a 
long distance and could affect recognition. 


B. Hydrogen Bonding 


Hydrogen bonding is a critical part of triple-helix stabilization. The very 
favorable enthalpy reported for collagen compared to other proteins is 
consistent with the importance of hydrogen bonding (Privalov, 1982). The 
triple helix has repetitive backbone hydrogen bonding networks, but 
differs from beta sheets or alpha helices in that the repeating tripeptide 
unit consists of three nonequivalent peptide groups, and not all backbone 
peptide groups participate in hydrogen bonding. It was clear from earliest 
models that one strong interchain peptide NH...OC bond could be 
formed per Gly-X-Y tripeptide unit (Ramachandran, 1967; Rich and Crick, 
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1961). Ramachandran originally argued that a second NH...OC bond was 
possible when the residue in the X position was not Pro (Ramachandran 
and Kartha, 1955), but because of distortion and steric problems, this view 
was modified to suggest an interaction mediated by water (Ramachandran 
and Chandrasekharan, 1968). 

All crystal structures show a hydrogen bond between the NH of Gly in 
one chain and the C=O of the residue in the X position of the neighbor- 
ing chain, as predicted (Fig. 4). In addition, peptides with sequences 
where the X position is occupied by a residue other than Pro, show a 
second interchain hydrogen bond between the amide group of the X 
position residue and the C=O of the Gly residue, which is mediated by 
one water molecule (Emsley et al., 2000, 2004; Kramer et al., 1999, 2000). 
This second set of hydrogen bonds connects chains in a direction opposite 
to that of the first set, as proposed by Ramachandran and Chandrasekhar- 
an (1968). In some cases, the water molecules involved in the NH (X 
position)...CO (Gly) hydrogen bond make additional hydrogen bonds 
with Hyp or side chains. When the Y position is occupied by an amino acid, 
rather than an imino acid, its amide group is hydrated by water molecules 
directed into the solvent, which is not likely to contribute to stability. 

The hydrogen bonding patterns seen in peptide crystals are consistent 
with hydrogen exchange studies and thermodynamic analyses (Fan et al., 
1993; Privalov, 1982; Yee et al., 1974). NMR hydrogen exchange studies on 
15N-labeled Gly-Leu-Ala residues in the central zone of peptide T3-785 
demonstrated that Gly NH exchanged the slowest, Leu NH exchanged 
almost as slowly, and Ala NH showed an exchange rate similar to peptides 
exposed to bulk solvent (Fan ei al., 1993). The water-mediated hydrogen 
bond of Leu, involving the X position amide, slows the hydrogen ex- 
change process almost as much as the direct Gly interchain hydrogen 
bonds. Hydrogen exchange studies on collagen also show two distinct 
slowly exchanging sets of amide hydrogens (Privalov, 1982; Yee et al., 
1974). Therefore, the second set of water-mediated hydrogen bonds is 
likely to contribute to stability and may serve to reinforce the triple helix in 
regions lacking Pro. 

The possibility of C°H...CO hydrogen bonds in polyglycine I and 
collagen was raised by Ramachandran (1967) and Krimm and Kuroiwa 
(1968). A detailed analysis of the high resolution structure of the G-A 
peptide supports the existence of two kinds of such bonds: an H-bond 
between the Gly C® in one chain and the Gly and Pro C=O groups from 
the other two chains; and an H-bond connecting Hyp C“ in one chain with 
a Pro C=O group on the neighboring chain (Bella and Berman, 1996). 
These C°H...CO bonds are suggested to define a network of weak inter- 
actions that may add additional stability to the strong NH...CO bonds. 
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Fic. 4. Schematic of the hydrogen bonding patterns seen in the high-resolution 


structure of EKG peptide. 
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C. Hydration Networks 


Water is an integral part of the collagen molecule (Fraser and MacRae, 
1973; Harrington and Von Hippel, 1961; Traub and Piez, 1971). The 
collagen triple helix has tightly bound water, and experimental evidence 
supports it being a highly ordered hydration network (Berendsen and 
Migchelsen, 1965; Grigera and Berendsen, 1979; Suzuki et al, 1980). 
The water was proposed to bind to available backbone carbonyls; since 
two C=O groups per tripeptide do not participate direct NH. ..CO bonds, 
and original concepts of single water binding were expanded to 
suggest waters bonded to each other and then to several main chain 
atoms (Ramachandran and Chandrasekharan, 1968). The function of 
hydroxyproline has also been related to this hydration network. On the 
basis of an elegant analysis of thermodynamics and hydrogen exchange 
data, Privalov found that Hyp stabilization of collagen is correlated 
with increased enthalpy (Privalov, 1982). Since Hyp cannot directly bond 
with the backbone of the molecule, enthaplic stabilization supports the 
involvement of an ordered water hydrogen-bonding network. 

The determination of the high-resolution structure of the G>A peptide 
provided the first visualization of the elaborate water network that sur- 
rounds collagen molecules (Fig. 2; Bella et al., 1994, 1995). Water mole- 
cules are seen to bridge C=O groups within and between molecules and to 
link C=O and Hyp hydroxyl groups within and between molecules. Often 
four or five water molecules participate in these bridges, with pentagonal, 
clathrate-like geometry. Strikingly repetitive networks of these water pat- 
terns are seen along the chain. Water tends to order near hydrophobic 
surfaces, and Bella et al. (1995) suggest the abundance of pentagonal water 
clusters in the triple-helical cylinder of hydration may result from the sol- 
vent exposure of nonpolar Pro and Hyp residues, together with the 
availability of backbone CO groups and Hyp OH groups to anchor the 
ordered water network to the peptide. Initially, questions were raised 
about whether these intricate water networks were real (Engel and 
Prockop, 1998), but the increasing number of high-resolution structures 
have confirmed that extended water networks are an inherent feature of 
all collagen triple-helix peptide crystal structures (Berisio et al., 2001, 2002; 
Kramer et al., 2000, 2001). NMR studies have indicated the kinetically 
labile nature of this collagen hydration shell (Melacini et al., 2000). 

The nature and order of the water network appears to depend on the 
molecular packing in the crystal and the specific sequence present. In 
peptide T3-785, less ordered water was seen in the terminal Gly-Pro-Hyp 
regions that were not well packed, than in the central Gly-Ile-Thr-Gly-Ala- 
Arg-Gly-Leu-Ala region with close molecular packing (Kramer et al., 2001). 
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For the EKG peptide, there was some disruption of the regular hydration 
pattern in the central charged region, where the packing is also less dense 
(Kramer et al., 2000). In addition to interactions involving backbone CO 
groups and Hyp, water is also seen to interact with sidechains, mediating 
hydrogen bonds between ionizable sidechains and between sidechains and 
the backbone (Kramer et al, 2000, 2001). In the central region of peptide 
T3-785, where there are no Hyp residues present, the waters involved in 
the hydrogen bonds between NH(X)...CO(Gly) also interact with Thr 
and Arg in the Y positions rather than Hyp (Kramer et al., 2001). 

The high-resolution structures of triple-helical peptides allow visualiza- 
tion of the ordered water network that was expected from studies on 
collagen, and has elucidated the specific bonds involved in this network 
and their repetitive nature. There is strong evidence supporting the role of 
this ordered water in maintaining molecular packing in fibrils (Leikin ei al., 
1994, 1995, 1997). The biological and physical significance of this 
water network with respect to molecular stability has been a subject of 
much debate (Jenkins and Raines, 2002). The Hyp hydroxyl groups are 
clearly key players in repetitive hydration networks in the high-resolution 
structures, but one cannot assess from a crystal structure alone whether 
the entropic cost of localizing the waters will be less or greater than the 
highly favorable enthalpy from hydrogen bond formation in the collagen 
molecule (see Section IV below). 


D. Side Chain Interactions 


The availability of high-resolution structures of peptides EKG, T3-785, 
and IBP, which include residues other than Pro and Hyp in the X and Y 
positions, offers the opportunity to investigate the conformation and 
interactions of sidechains from residues typically found within the colla- 
gen triple helix. In the peptide with an EKG tripeptide sequence, the Lys 
and Glu residues did not form direct intermolecular or intramolecular ion 
pairs, even though such pairs are sterically feasible (Kramer et al., 2000). 
Instead, the Lys side chains bond to Y position carbonyl groups of an 
adjacent chain, while one Glu directly interacts with a Hyp hydroxyl group. 
There was also a range of water-mediated interactions involving the polar 
sidechains. In peptide T3-785, with the central region Gly-Ile-Thr-Gly-Ala- 
Arg-Gly-Leu-Ala, the Arg side chains make direct contacts with backbone 
carbonyl groups on an adjacent chain, confirming predictions of this 
interaction and clarifying the high stability of Arg in the Y position 
(Fig. 2; Kramer et al., 2001; Vitagliano et al, 1993). In T3-785, the Arg 
sidechains also make hydrophobic interactions with Leu and Ie sidechains 
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from the same or neighboring molecules, forming nonpolar clusters that 
minimize exposure to solvent. The Thr sidechains are involved in bonding 
with the water, which mediates hydrogen bonds between the amide groups 
from the X position and Gly C=O groups. The participation of Thr in the 
Y position in the water network, much as Hyp does when it is in the Y 
position, suggests Thr could play a similar stabilizing role in invertebrates 
and bacteria. In IBP, with the central Gly-Phe-Hyp-Gly-Glu-Arg sequence, 
one Glu is involved in an intrahelix water mediated interaction with Arg, 
while Glu (as well as the Arg sidechains) are involved in direct interactions 
with backbone peptide groups of neighboring molecules. 

All sidechains in X and Y positions of the triple helix are exposed to 
solvent and appear to have multiple options of interacting through solvent 
and with available backbone carbonyl groups, in addition to sidechain- 
sidechain interactions. NMR studies on collagen fibrils show there are two 
interconverting conformations of Leu in fibrils, and considerable reorien- 
tation around the helix axis (Batchelder et al., 1982; Sarkar et al., 1983). 
This suggests there may be switching between alternative interaction sets 
in solution and even in fibrils. Thus, the sidechain orientations and 
interactions seen in the crystal structure may represent one of a number 
of possibilities, rather than a uniquely determined interaction. 


E. Molecular Packing 


One surprise in the X-ray diffraction studies was the resemblance of the 
molecular packing of peptides in the crystal structure to that seen by fiber 
diffraction for collagen molecules in tendon fibrils. The packing of mole- 
cules in the crystals of G-A. EKG, and POG is quasi-hexagonal with 
intermolecular spacings near 14A, an arrangement and distance very 
similar to collagen molecular packing (Fig. 5; Fraser et al., 1983). The 
14A spacing between triple-helices is spanned by highly ordered water 
molecules that connect neighboring molecules. This type of network is 
consistent with the attractive hydration forces that have been suggested to 
function in collagen assemblies (Leikin et ol, 1994, 1995, 1997). The 
strong similarity between peptide and collagen organization suggests that 
the 14A lateral spacing and quasi-hexagonal packing observed in collagen 
assemblies are sequence-independent and determined by the effective 
diameters of the cylinder of hydration coating the triple helix. 

The most common intermolecular interaction—observed in GEK, GPO, 
and T3-785 crystals—was Hyp-Hyp hydrogen bonds between neighboring 
triple-helices (Fig. 5; Berisio et al., 2001; Kramer et al., 2000, 2001). The 
observation of Hyp-Hyp intermolecular interactions in the peptide crystals 
and the report that recombinant unhydroxylated collagen will not form 
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Fic. 5. (A) Cross-section of the molecular packing of the EKG peptide in the crystal. 
(B) The Hyp-Hyp interactions seen between neighboring molecules in the crystal. 


fibrils (Perret et al., 2001) support the original hypothesis of Gustavson 
(1955) that Hyp plays an important role in molecular association and fibril 
formation. There are Hyp residues all along triple helices, and direct 
interactions between two Hyp groups in adjacent molecules could be 
important in stabilization or orientation of molecular packing. 
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While the peptide lateral packing in crystals resembles that in collagen 
fibrils, the axial relationships, such as the D staggering (234 residues or 
670A), are likely to reflect interactions between charged and/or hydropho- 
bic side chains. It was assumed that such interactions would be direct, bet- 
ween side chains of neighboring molecules, but in the crystal structures, an 
Arg or Lys group of one molecule usually binds to the backbone of a 
neighboring molecule (directly or through water) rather than to another 
side chain (Emsley et al., 2004; Kramer et al., 2001). The EKG peptide shows 
a staggering of neighboring molecules with charged Lys and Glu residues 
of one peptide aligned with the charged N and C-termini of other mole- 
cules (Kramer et al., 2000), supporting the importance of electrostatic 
interactions in determining axial relationships between molecules. 


F. Molecular Dynamics 


Even though the collagen triple helix is considered a rigid rod, NMR, 
fluorescence, and molecular dynamics studies indicate there are sequence- 
dependent motions of the backbone and side chains. It appears that 
Gly-Pro-Hyp is the most rigid, as well as the most stabilizing, sequence. 
In model peptides, all backbone amides show high-order parameters 
typical of a rigid structure, but the Gly amide shows a slower hydrogen 
exchange rate when in a Gly-Pro-Hyp environment (Gly-Pro-Hyp-Gly-Pro- 
Hyp-Gly-Pro-Hyp) than in the imino acid poor environment (Gly-Ala-Arg- 
Gly-Leu-Ala-Gly-Pro-Hyp) (Fan et al., 1993). Side chains of residues other 
than Pro and Hyp have considerable mobility in the collagen molecule, 
usually adopting more than one preferred orientation (Batchelder et al. 
1982). Fluorescence studies of host-guest peptides showed that Trp in the 
Y position has a more restricted motion than Trp in the X position, which 
is consistent with the greater solvent accessibility of the X position (Simon- 
Lukasik et al., 2003). Solid-state NMR studies of collagen with isotopically 
labeled residues indicate that the imino acid rings have angular fluctua- 
tions and that amino acid side chains are dynamic, reorienting around at 
least two side chain bonds (Sarkar et al., 1983, 1987). Significant azimuthal 
motions around the helix axis were also observed (Sarkar et al., 1983). 

The sequence-dependent mobility of the collagen triple helix is impor- 
tant for its biological function. The unique collagenase cleavage site of 
types I and III collagen is at a boundary with a stable N-terminal region 
and a highly unstable C-terminal sequence (Fields, 1991). A molecular 
dynamics study carried out on the collagenase cleavage site of type III col- 
lagen, based on the crystal structure of peptide T3-785, suggested an 
alternative partially unfolded conformation that could expose the triple- 
helix backbone to cleavage (Stultz, 2002). Recently, experimental studies 
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indicate that collagenase binds and locally unwinds the triple-helix con- 
formation in type I collagen prior to hydrolysis of its peptide bonds 
(Chung et al., 2004). Locally unstable regions, which would be expected 
to be more dynamic, have also been implicated in collagen binding of 
various ligands as discussed below in Section VI. 

Computational studies starting with collagen triple-helix structures now 
available in the Protein Data Bank are becoming increasingly important for 
collagen. As described above, molecular dynamics simulations can suggest 
models that are amenable to experimental verification. Computational 
analyses and molecular dynamics studies have also been carried out on Gly 
substitutions in the triple helix (see Section VIII below). 


IV. MECHANISM OF HYDROXYPROLINE STABILIZATION 


Typically, the imino acid content of collagen is about 20%, with at 
least half in the form of hydroxyproline. Since collagens are the only 
animal proteins with a high content of hydroxyproline, an important 
role is indicated for this posttranslationally modified residue. Early 
proposals by Gustavson supported a role for Hyp in the stabilization 
of collagen fibrils, while later studies focused on a role in stabilization of 
the triple-helical molecule (Fraser and MacRae, 1973; Gustavson, 1955; 
Ramachandran, 1967). Evidence that Hyp contributes to triple-helix sta- 
bility came from the greater stability of (Pro-Hyp-Gly)jo (Tm=60 °C) com- 
pared with (Pro-Pro-Gly)jo (Tm=30 °C), and from the decreased stability of 
unhydroxylated collagen synthesized in the presence of prolyl hydroxylase 
inhibitors (Rosenbloom et al., 1973; Sakikabara et al., 1973). Recent studies 
on recombinant collagen expressed in tobacco plants confirm the de- 
crease in collagen stability when hydroxylation of proline is absent (Perret 
et al., 2001). 

The nature and the basis of the stabilizing effect of hydroxyproline have 
been actively investigated through studies on repeating polytripeptides 
and host-guest peptides (Table III). The predominant hydroxypro- 
line found in collagen domains is 4-Hyp (i.e., on the gamma carbon of 
the imide ring). 4-Hyp is exclusively in the R diastereoisomer form and the 
importance of this stereospecificity is illustrated by the inability of (Pro- 
4SHyp-Gly);o to form a triple helix (Inouye et al., 1976). The location of 
(4R)-Hyp in the Y position of the Gly-X-Y repeating sequence is also critical 
since (4RHyp-Pro-Gly)io, with Hyp in the X position rather than the Y, 
does not adopt a triple-helical conformation (Inouye et al., 1982). Inter- 
estingly, triple-helices can be formed with (4R)-Hyp in the X position if 
(4R)-Hyp or Thr is in the Y position (Table II; Bann and Bachinger, 2000; 
Berisio et al., 2004). A very small amount of (3S)-Hyp is present in most 
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TABLE HI 
Thermal Stability of Repeating Polytripeptides (Gly-X-Y) 1o and Host-Guest Peptides, 
(GPO) s-GXY-(GPO),4, Containing Hydroxyproline and Fluoroproline* 


Tm ( C) 
Triplet Repeating Host-guest References 
Pro-Pro-Gly 33 45.5 (Persikov et al., 2003) 
(Persikov et al., 2003) 
Pro-(4R) Hyp-Gly 60 47.3 (Persikov et al., 2003) 
(Persikov et al., 2003) 
(4R) Hyp-Pro-Gly <4 43.0 (Inouye et al., 1982; 
Persikov et al., 2003) 
Pro-(4S) Hyp-Gly <4 Inouye et al., 1982 
(4S) Hyp-Pro-Gly <4 Inouye et al., 1982 
(4R) Hyp-(4R)Hyp-Gly 65 47.3 (Berisio et al., 2004; 
Persikov et al., 2003) 
Gly-Pro-(3S) Hyp <4 37.5 (Mizuno et al., 2004) 
Gly-(3S) Hyp-(4R) Hyp <4 49.6 (Mizuno et al., 2004) 
Pro-(4R) Flp-Gly 87 43.7 (Holmgren et al., 1999; 
Persikov et al., 2003) 
(4R) Flp-Pro-Gly <4 (Doi et al., 2003) 
Pro-(4S) Flp-Gly <4 (Bretscher et al., 2001) 
(4S) Flp-Pro-Gly 58 (Doi et al., 2003) 
Gly-Pro-Thr <4 (Bann and Bachinger, 
2000) 
Gly-(4R) Hyp-Thr 19 (Bann and Bachinger, 
2000) 


* Hodges and Raines (2003) carried out experiments for (4R)Flp and (4S)Flp in the 
X position using a (X-Y-Gly)7 design, and reached similar conclusions to those seen for 
the ten repeating units. 


collagens in the X position, but model polytripeptides, including this 
residue in the X or Y position, were not triple-helical and it had a 
destabilizing influence in host-guest peptides (Jenkins et al., 2003; Mizuno 
et al., 2004). 

The mechanism of stabilization by Hyp has generated considerable 
controversy (Fig. 6). Since the hydroxyl group of Hyp is directed outside 
and is unable to participate in any direct hydrogen bonds within the triple 
helix, the possibility of hydrogen bonding through water was suggested 
(Ramachandran et al., 1973; Suzuki et al., 1980). The strongest evidence 
for Hyp stabilization through water-mediated H bonding comes from 
Privalov’s analysis showing that, in different collagens, increased Hyp 
content is correlated with increased enthalpic stabilization (Privalov, 
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Fic. 6. Two proposed mechanisms for hydroxyproline stabilization. 


1982). When the first high-resolution crystal structures were solved, hy- 
droxyl groups of Hyp were seen to act as anchoring points for both 
intramolecular and intermolecular multispan water bridges (Bella ei al., 
1994, 1995). It was suggested that the upward pucker of Hyp rings 
could be important because of Hyp’s favorable orientation for binding 
to water networks. Arguments were raised about the existence of this 
hydration network in the crystal structures (Engel and Prockop, 1998), 
but such organized water was later confirmed in various laboratories for all 
triple-helical peptides. A highly ordered network was also seen in (Pro-Pro- 
Gly)ıo crystals, in the absence of Hyp, and questions were raised about 
whether the increased stability of (Pro-Hyp-Gly);9 peptide originates from 
the hydrogen bonding of hydroxyprolines with the hydration network 
(Holmgren et al., 1999). 
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In the late 1990s, Ronald Raines’s group stimulated this area by taking a 
new approach, incorporating fluoroproline (Flp) residues in the design of 
collagen-like peptides (Jenkins and Raines, 2002). Fluorine has a greater 
electron-withdrawing effect on the imide ring than a hydroxyl group, and 
has a low tendency to form hydrogen bonds. The observation that (Pro- 
Flp-Gly)jo (Tm~90°C) was much more stable than (Pro-Hyp-Gly)ıo 
(Tm»60°C) led to investigations about the basis of Flp stabilization and 
a reexamination of the nature of Hyp stabilization. Raines proposed that 
electron-withdrawing effects in (4R)-Flp would lead to a strong preference 
for the puckering of the imino ring in an up (or exo) conformation, and 
to the favoring of the trans vs. cis imide peptide bond (De Rider et al., 
2002). The puckering of imidic acids in the triple helix seen in the crystal 
structures largely confirmed the presence of down or endo for imino 
acids in the X position and up or exo conformation for imino acids in the 
Y position, as previously concluded from fiber diffraction models (Fraser 
et al., 1979) and NMR data (Fan et al. 1993). Vitagliano et al. (2001) sugges- 
ted that the stabilization of the triple helix by (4R)-Hyp in the Y position 
is due to its preference for the exo puckered conformation. Observations 
on the stabilizing effect of 4SFlp, which strongly prefers the down pucker 
in the X position, is consistent with this hypothesis (Table II; Doi et al., 
2003; Hodges and Raines, 2003). Hydrogen bonding through a water 
network is not a possibility for Flp, so its stabilization of the triple helix, 
which is even greater than that of Hyp, must come only from ring pucker and 
trans/cis preferences, mediated by its electronegative, inductive effect. 

Thus, recent experiments involving Flp as well as Hyp, in different 
stereoisomeric forms and in X and Y positions, have made major contribu- 
tions to our understanding of triple-helix stabilization (Berisio et al., 2002, 
2004; Persikov et al., 2003). However, it is not clear that Flp and Hyp 
stabilize the collagen triple helix by the same mechanism. The experimen- 
tal results support a predominant role for the inductive effect and its 
subsequent effect on the exo ring pucker in the stabilization of the triple 
helix by Flp. However, the large enthalpic stabilization seen for (Gly-Pro- 
Hyp)ıo and for collagens suggests that hydrogen bonding of Hyp through 
hydration networks is likely to play an important role in hydroxyproline 
stabilization, in addition to its favoring of the exo pucker. 

The unique Hyp residue may be important both at the molecular and 
higher-order structure levels in collagen. Bacteria and viruses lack prolyl 
hydroxylase and have no hydroxyproline in their collagen-like domains. 
The significant differences in their amino acid composition and sequence 
compared to animal collagens suggest the use of alternative stabilization 
strategies for the triple helix (Rasmussen et al., 2003). 
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V. BREAKS IN THE GLY-X-Y REPEATING PATTERN 


The basic amino acid features of the collagen triple helix can be 
reconsidered in light of the many sequences now available for this motif. 
Early studies on fibril-forming type I collagen indicated that Gly should be 
every third residue throughout a triple-helix domain. But it is clear that 
the sequences of most collagens and collagen-like domains include spo- 
radic interruptions in the repeating pattern, and that the fibril-forming 
collagens, with their (Gly-X-Y) 338.341 structure, are unusual in having such 
a strict requirement. Type IV and VII collagen each contain more than 20 
breaks in their very long triple-helical sequences. A number of collagens 
have between 2-10 interruptions (e.g., types VIII, X, VI), while some 
domains have a single break (e.g., mannose binding protein, SP-A, Clq) 
(Table I). The structural and functional implications of these breaks in the 
Gly-X-Y repeat may vary in different molecules. Early studies on Clq 
indicated that the site of the single interruption in the triple helix led 
to a rigid kink (Kilchherr et al., 1985), while some breaks in type IV 
collagen and type XVI have been associated with flexible sites (Kassner 
et al., 2004; Siebold et al., 1987). As seen in the GA peptide structure 
(where one Gly is replaced by an Ala) or the Hyp-peptide structure 
(where one Hyp is absent), one consequence of a break is to induce a 
discontinuity in registration between the domains on either side, serving 
to mark different regions (see Section VIII below; Bella et al., 1994; Liu, 
2000). A strict requirement for Gly as every third residue must be met in a 
domain of at least 6-7 tripeptide units in order to adopt a triple-helical 
conformation, but this conformation may be extended through breaks 
that do not meet this requirement. 


VI. Amino ACID SEQUENCE AND STABILITY 


All collagens have a high content of stabilizing imino acids, including 
the posttranslationally formed hydroxyproline, but there is a range of 
these important residues in collagens and collagen-like domains. In using 
translated sequences of DNA, it is assumed, based on experimental studies 
of collagens Clq and MBL, that all Pro residues in the Y position will be 
hydroxylated to Hyp, since prolyl hydroxylase specifically recognizes this 
position. Type I collagen has an equal amount of Pro and Hyp, while most 
collagens have more Hyp than Pro residues. The thermal stability of col- 
lagens from different animals correlates with their upper environmental 
temperature and with their imidic acid and Hyp content (Burjanadze, 
2000; Rigby and Robinson, 1975). 
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Sequences of the form Gly-Pro-Hyp confer maximal stability to the col- 
lagen triple helix, while variations in the identities of the residues in the X 
and Y positions determine global thermal stability and modulate local 
stability and energetics that are required for self-association, recognition, 
and binding. Recent studies on recombinant collagen have shown that 
there are domains of varying stability along the collagen chain (Steplewski 
et al., 2004). As seen in Section IV, the use of repeating polytripeptides 
have been extremely important in defining conformational features, but 
repeats of most tripeptide sequences in collagen will not lead to a stable 
triple helix. One approach to analysis of more varied collagen sequences 
has been host-guest peptides, where one or two tripeptide units are 
introduced into a constant, stabilizing Gly-Pro-Hyp framework. This ap- 
proach has been used to determine the propensity of all 20 amino acids 
for the X position in a Gly-X-Hyp triplet, and for all 20 residues in the Y 
position in a Gly-Pro-Y triplet (Persikov et al., 2000) The most stabilizing 
residues for the X position are Pro, Glu, Ala, Lys, Arg, Gln, and Asp, while 
the most stabilizing residues for the Y position are Hyp, Arg, Met, Ile, Gln, 
and Ala. The least stabilizing residues for both positions are the aromatic 
residues and Gly. These data provide a scale for the propensities of a given 
residue in the X or Y position in a homotrimer, measured in terms of the 
destabilizing influence of each residue compared to Gly-Pro-Hyp. 

The propensities of individual residues for the triple-helix conformation 
may be affected by intramolecular interactions. For a given Gly-X1-Y1-Gly- 
X2-Y2 sequence, interchain interactions are sterically possible. Because the 
three chains (A, B, and C) are staggered by one residue, there are close 
contacts between sidechain X1 of chain A and Yl of chain B, and between 
Yl of chain C and X2 of chain A. In addition, intrachain X1-X2 and Yl-Y2 
interactions are possible. Measurements of thermal stability of host-guest 
peptides with varying residues in the guest positions show that a subset of 
all sterically possible ion pairs lead to favorable interactions, and some 
favorable hydrophobic interactions as well (Persikov et al., 2002, 2005). 
Helix stability is increased by interchain interactions such as Gly-Arg-Asp 
and Gly-Lys-Asp sequences, but the most energetically favorable interac- 
tions occur when sequences of the form Lys-Gly-Glu and Lys-Gly-Asp are 
present. The thermal stability of the 400 possible Gly-X-Y sequences 
are presented in Table IV, with the experimentally determined values 
shown in bold; those values predicted on the basis of the additivity of 
individual residues in the X and Y positions are shown in italics. A signifi- 
cant variation in thermal stability can be seen varying from the most 
stabilizing Gly-Pro-Hyp unit (Tm = 47°C) to the low stability Gly-Gly-Phe 
(Tm = 20°C). 


TABLE IV 
Predicted (italics) and Experimentally Observed (bold) T,, Values CC) for all Possible Gly-X-Y Tripeptide Units in a 
Triple-Helix, Based on Host-Guest Peptide Studies* 


X\Y (0) R M I Q A Vv E T C K H N D G L N Y F WwW 


47 47 43 42 41 41 40 40 40 38 37 36 35 34 33 32 30 30 28 26 
4 40 38 37 38 35 35 35 36 33 35 31 31 30 29 2 30 26 24 22 
42 38 37 36 36 33 34 34 34 32 3 30 33 33 27 28 26 25 22 21 
42 39 37 36 3 35 34 35 34 32 3l 30 29 36 27 27 32 24 23 20 
41 41 36 35 35 34 3 34 33 31 30 29 31 35 26 26 25 24 22 19 
40 40 36 35 34 34 3 3 3 31 33 29 28 27 26 26 25 23 22 19 
40 37 35 34 34 32 3 3 33 31 31 29 28 27 26 26 25 2 21 19 
39 39 34 33 36 31 32 31 31 29 31 27 27 26 25 27 23 22 20 18 
39 39 34 3 3 3 32 31 31 29 33 27 27 26 25 4 23 22 20 18 
34 3 33 32 31 31 31 29 32 27 26 25 24 24 23 22 20 17 
38 38 34 33 32 34 31 31 31 29 28 27 26 25 24 24 23 21 20 I7 
38 38 34 33 32 32 31 31 31 29 28 27 26 25 24 24 23 2l 19 17 
38 38 33 32 32 32 3l 30 30 28 28 26 26 25 24 23 22 2l I9 17 
37 36 32 3l 31 30 29 29 29 27 26 25 24 23 22 22 21 19 18 15 
36 36 32 30 30 30 29 29 29 27 26 25 24 23 22 22 21 19 17 DB 
36 36 31 30 30 30 29 29 29 27 26 25 24 23 22 22 21 19 17 DB 
34 34 30 29 28 28 27 27 27 25 24 23 22 21 20 20 19 17 15 13 
34 33 29 28 28 24 26 26 26 24 23 22 21 20 19 19 18 16 15 12 
33 33 29 27 27 26 26 26 26 24 27 22 21 20 19 25 18 16 20 12 
32 32 27 26 26 26 25 24 24 22. 21 20 20 19 18 17 16 15 13 11 
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*The rows of amino acids are listed in order of their X position propensity for triple-helix formation, while the amino acids in 
columns are listed in order of their Y position propensity. The predicted Tm values are based on simple additivity of the stability of the 
residue in the X position and that in the Y position (see Persikov et al., 2002). 
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Quantification of the intrinsic propensities and interactions for the 
most common sequences can now be used as a basis for predicting stability 
of peptides and local stability in collagens. Comparison of the predic- 
ted stability of more than 20 peptides with observed Tm values shows there 
is good agreement in the majority of the cases (Persikov et al., 2002, 2005). 
This is an important asset in designing peptides that will form stable triple- 
helices, which are being used increasingly to map binding sites (see 
Section VII below) as well as for investigating interactions. It had been 
proposed some years ago that it would be possible to predict the local 
stability along collagen on the basis of individual tripeptide propensities 
(Bachinger and Davis, 1991), and these host-guest peptide data provide a 
realistic basis for such predictions. 


VII. LIGAND BINDING 


The binding of various ligands to the collagen triple helix is critical to 
biological function (Deprez et al., 2000; Di Lullo et al., 2002; Kadler, 1994; 
Knight et al., 2000). Collagen binding to integrins and other cellular 
receptors mediates cell adhesion, while interaction of collagen with pro- 
teoglycans and other matrix molecules organizes the extracellular matrix, 
giving each tissue its distinctive mechanical properties. The binding of 
microbial surface components recognizing adhesive matrix molecules 
(MSCRAMMs) to collagen mediates bacterial adhesion to the integrins 
of the host cell (Patti e¢ al, 1995). The turnover of collagen involves 
recognition of one specific site on the triple helix by matrix metallopro- 
teinases (collagenase) and the biosynthesis of collagen involves binding of 
the chaperone Hsp47. Binding of ligands to collagen-like domains in host- 
defense proteins plays an important role in their physiological activity. The 
complement serine proteinases Clr and Cls bind to the collagenous 
domain of Clq during activation, while the MASP serine proteases bind 
to mannose binding lectin (MBL). Polyanionic ligands, including oxidized 
LDL, interact with the triple-helix domain of the macrophage scavenger 
receptor (MSR); the collagenous domain of asymmetric AChE binds to 
heparan sulfate in the basal lamina of neuromuscular junctions. As the list 
of proteins with collagen domains grows, so does the number of observed 
and proposed interactions. It is not clear how interactions mediated by a 
collagen triple helix compare to those found with globular proteins in 
terms of the nature of the interactions, degree of specificity, and strength 
of binding. 

Rotary shadowing and transmission electron microscopy of collagen 
with bound ligand or antibodies have been used to roughly locate the 
binding site, and binding assays using cyanogen bromide fragments of 
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collagen have often narrowed down the region of interest (Di Lullo et al, 
2002). The precise definition of the sequence involved has been deter- 
mined using triple-helical peptides that cover the region of interest 
(Glattauer et al, 1997; Knight et al, 2000), or mutational analysis for 
smaller collagen-like domains that can be studied by recombinant DNA 
methodology (Doi et al., 1993; Wallis et al, 2004). The known binding sites 
along type I collagen, both at a low and high resolution level, have been 
mapped by the group of San Antonio, in order to envision their relation- 
ships with each other and the D-periodic fibril organization (DiLullo et al., 
2002). 

Although more than 50 molecules are listed as interacting with type I 
collagen (Di Lullo et al., 2002) and ligands are known to bind to collage- 
nous domains in host defense protein, it has proved difficult to define 
(Gly-X-Y) , sequence binding sites within the triple helix. At this time, the 
specific sequence in the triple helix responsible for binding has been 
defined in a small number of cases: for integrin binding to type I collagen; 
the binding of a monoclonal antibody to type IH collagen; heparin sulfate 
binding to the collagenous tail of acetylcholinesterase; ligand binding to 
MSR; Hsp47 to type I collagen; decorin binding to type I collagen; and 
MASP binding to MBL (Table V). In addition, the location of the unique 


TABLE V 
List of Known Binding Sequences in Collagens and Collagen-Like Proteins 
Binding of Binds to Sequence Reference 
Type I collagen Integrin a181, GFOGER Knight et al., 
a261 2000 
HSP-47 Gly-X-Arg Koide et al., 2002 
Heparin KGHRGF Sweeney et al., 
1998 
Collagenase GLA or GIA Fields, 1991 


(cleavage site) 
Crosslinking Gly-X-Hyl-Gly-His-Arg-Gly Kadler, 1994 
site 


Type II collagen Monoclonal GLAGAOGLR Glattauer et al., 
antibody 1997 
Type IV collagen Integrin alßl al(IV)Asp461, @2(IV)Arg461 Golbik et al., 2000 
Type V collagen Heparin GKPGPRGORGPTGPRGER Delacoux et al., 
2000 
Collagenous tail of Heparin GROGRKGRO, Deprez et al., 
AchE GROGKRGKQGQK 2000 
MBL collagen MASP GLRGLQGPOGKLGPOG Wallis et al., 2004 


domain 
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collagenase cleavage site and the two crosslinking sites have been well 
established, defining the collagen sequences that are recognized (Fields, 
1991; Kadler, 1994). 

In some cases there is evidence that a local decrease in stability or 
looseness promotes binding. The type III collagen epitope for a monoclonal 
antibody is flanked by destabilizing Gly-Gly-Y triplets (Shah et al., 1997), 
and characterization of model peptides for the type IV cell adhesion has 
suggested conformational heterogeneity (Sacca et al., 2003). Triple-helical 
peptide models of the collagenous tail of asymmetric acetylcholinesterase 
(AChE) show more effective binding to heparin when their stability is 
lower (Deprez et al., 2000). The recognition sequence for serine protei- 
nases (MASPs) on MBL appears to be just C-terminal to the destabilizing 
Gly-Gln-Gly interruption in the triple helix (Wallis et al., 2004), and the 
collagenase cleavage site has a C-terminal locally unstable region (Fields, 
1991); 

A major advance in understanding collagen-ligand interactions came 
with the high-resolution structure of cocrystals in the I domain of a2(1 
integrin with a triple-helical peptide (Gly-Pro-Hyp)ə-Gly-Phe-Hyp-Gly-Glu- 
Arg-(Gly-Pro-Hyp)s containing the known type I collagen binding se- 
quence (designated the IBP peptide) (Fig. 7; Emsley et al., 2000). The 


Fıc. 7. Structure of the integrin I domain bound to the IBP triple-helix peptide. 
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three strands of this homotrimer—designated leading, middle, and 
trailing—have unique environments. Interactions with the I domain are 
largely mediated by the middle strand. The middle strand Glu completes 
the coordination sphere of a metal formed by the I domain MIDAS (metal 
ion-dependent adhesion site) motif, while the middle strand Arg forms a 
salt bridge to the I domain. The Phe residues of the middle and trailing 
strand make contact with the surface of the I domain. Peptide studies 
showed the Glu is essential and the Arg is required for high-affinity 
binding (Knight et al., 2000). As discussed above in Section III.A, the 
IBP peptide has three distinct regions in terms of helix twist, and the 
kinking and bending at the junctions of these regions appears to play a 
critical role in optimizing interactions with the I domain (Emsley et al, 
2004). The dominance of one strand in interactions with the integrin may 
have important implications for heterotrimeric collagen domains, and 
suggests the importance of recent strategies for synthesizing heterotri- 
meric collagen model peptides (e.g., see Sacca et al., 2003). 


VIII. MUTATIONS AND DISEASE 


More than 1000 different mutations leading to human disorders have 
been observed in various types of collagens, and the clinical manifestations 
of these diseases reflect the specific tissue distribution of the particular 
collagen type that contains the mutation. Several excellent reviews of colla- 
gen diseases have been published (Byers and Cole, 2002; Myllyharju and 
Kivirikko, 2001, 2004). Well-characterized diseases include osteogenesis 
imperfecta (OI), where bone fragility results from mutations in type I 
collagen; Ehlers Danlos syndrome type IV, with aortic rupture as a result of 
mutations in type III collagen; Alport syndrome, with progressive renal 
failure resulting from basement membrane Type IV collagen mutations; 
and the dystrophic form of epidermolysis bullosa, with scarring and 
blistering of skin due to mutations in the skin anchoring fibrils Type VII 
collagen (Myllyharju and Kivirikko, 2001, 2004). Collagen mutations are 
generally dominant, because of their presence in multisubunit molecules 
and higher-order assemblies. 

The first collagen genetic disease to be characterized was OI, a clinically 
heterogeneous disorder characterized by varying degrees of bone fragility 
(Byers and Cole, 2002). Cases are classified in four major categories, 
ranging from mild to perinatal lethal. OI results from mutations in the 
genes that code for the al (I) or @2(I) chains of type I collagen, the major 
structural protein in bone. The majority of mutations involve single base 
substitutions in Gly codons in the triple helix, which lead to the replace- 
ment of a single Gly within the (Gly-X-Y)33g sequence by one of eight 
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bulkier amino acids (Ala, Arg, Asp, Cys, Glu, Ser, Trp, Val). More than 150 
distinct mutations have been reported for different cases, and the muta- 
tion sites are located at varying sites along the length of both al(I) and 
a2(I) collagen chains. The defective mineralization of bone collagen may 
be related to delayed collagen folding, a small decrease in helix stability, 
excess posttranslational modification, increased intracellular breakdown, 
reduced secretion, and abnormal fibril assembly (Byers and Cole, 2002). It 
is not known why different Gly substitution mutations lead to differing 
degrees of OI clinical severity, but it has been suggested to be related to 
the identity of the residue replacing Gly, the location of the mutation site 
with respect to the C-terminus, and the sequence immediately surround- 
ing the mutation (Byers, 2001; Marini et al., 1993). Disorders caused by 
single base substitutions in Gly codons within the triple helix are also 
common in other collagens, even in cases such as type IV collagen, where 
breaks in the Gly-X-Y repeating pattern are found normally (Hudson ei al., 
2003). 

Gly substitution mutations in the triple-helix domains of mannose- 
binding protein and Clq have also been associated with clinical disorders 
(Petry et al., 1997; Turner, 2003). A Gly—Asp mutation in the fifth codon 
of the triple-helix domain of MBL is a common variant, reaching as high as 
30% in some populations. This defect leads to MBL deficiency in the 
serum, resulting in susceptibility to infection, particularly in young chil- 
dren. Additional MBL mutations, Gly—Glu in the sixth codon of the triple 
helix and Arg—Cys in the Y position of the fourth codon are also ob- 
served. The Gly45Asp variant has been associated with defects in the 
oligomerization of MBL and also with binding to its associated MASPs, 
so that its function as host-defense is compromised, leading to increased 
infections (Wallis et al., 2004). 

Since Gly replacements are such a common feature in collagen diseases, 
it is important to understand the consequences of such a break in the 
Gly-X-Y repeating pattern for the collagen triple-helix structure, and to see 
if any molecular features correlate with clinical severity in OI. Peptides 
have been used to investigate the structural and energetic consequences of 
a Gly replacement on the collagen triple helix. A single replacement of a 
Gly by an Ala in (Pro-Hyp-Gly)19 was very destabilizing (Long et al., 1993), 
and the crystal structure of this homotrimer G—A peptide showed only a 
small bulge at the Ala site, with a local disruption of NH...CO bonds, 
which are replaced by water-mediated hydrogen bonds (Bella et al., 1994). 
The triple-helices in the Gly-Pro-Hyp sequences at both ends are well- 
ordered 7/2 helices, but the registration between the two ends is lost as a 
result of a slight untwisting at the replacement site. This loss of spatial 
coherence could lead to alterations in fibril formation and failure to 
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mineralize. More realistic peptide models, including a sequence of colla- 
gen with an OI site capped by C-terminal Gly-Pro-Hyp triplets, have 
been characterized by calorimetry, circular dichroism, and NMR spectros- 
copy (Baum and Brodsky, 1999). The introduction of a Gly to Ser or 
Ala replacement interrupts the C-to N-terminal folding of this peptide, 
suggesting that some renucleation event is needed to continue through 
Gly substitution sites. 

The X-ray results on the Gly—Ala peptide have been complemented by 
computational studies to understand the effect of Gly replacements on the 
triple helix, largely carried out in Teri Klein’s group (Klein and Huang, 
1999; Mooney et al., 2001; Radmer and Klein, 2004). Molecular dynamics 
and free energy calculations were done on the G—A peptide. All peptide 
models of mutations are limited by the use of homotrimers, while compu- 
tational methods were used to evaluate the consequences of a more 
realistic model of having a Gly replacement in just one or two chains. 
More recently, molecular dynamic simulations were done on a sequence 
modeling the site of a lethal OI mutation, examining the effects of 
neighboring residues on the hydrogen bonding networks. 

All Gly substitutions dramatically destabilize the triple helix in peptide 
models. Studies on host-guest peptides show the degree of destabilization 
depends on the identity of the residue replacing Gly. The residues, in 
order from least to most destabilizing, are: Ala < Ser < Cys < Arg <Val < 
Glu, Asp < Trp (Beck et al., 2000). This order of destabilization correlates 
with the clinical severity of OI, demonstrated when two cases with different 
Gly replacements occur at the same site in al(I) chains. For instance, 
Gly883Ser gives rise to a mild OI phenotype, while Gly883Asp gives rise to 
a lethal phenotype (Byers and Cole, 2002). The spectrum of amino acids 
replacing Gly in OI cases is significantly different from that predicted on 
the basis of nucleotide mutations rates and codon usage in type I collagen 
(Persikov et al., 2004). This difference is most striking for the nonlethal 
cases, where the least and most destabilizing residues are underrepresent- 
ed. This suggests that not every Gly replacement leads to a clinically 
detectable disorder, and supports the hypothesis that there is underrepre- 
sentation of the most destabilizing cases because they are not viable, and 
underrepresentation of the least destabilizing Gly—X_ substitutions 
because they are too mild to be clinically classified as OI. A schematic 
illustration of the number of different replacements observed in lethal and 
nonlethal cases of OI is shown in Fig. 8. The nonlethal and lethal categories 
represent observed data, which were scaled by the total number of Ser (the 
most frequent replacement in OI) and used to estimate the number that 
were undetectable for being too mild or too severe (Fig. 8). 
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less destabilizing ---- more destabilizing 
Gly— Ala Ser | Cys Arg Val | Glu Asp 
g 
$ | Not observed [17] [0] DI [1] [1] [0] [0] 
gl Normal 65% 0% 5% 3% 4% 0% 0% 
S | 
2 Observed 6 30 14 15 2 1 1 
H Non-lethal 23% 51% 70% 41% 9% 9% 2% 
4 | Observed 3 29 5 13 | 10 1 8 
g Lethal 12% 49% 25% 35% 44% 9% 18% 
© 
n 
g | Not observed [0] [0] [0] [8] [10] [9] [36] 
£ Not viable 0% 0% 0% 21% 43% 82% 80% 


Fic. 8. Schematic showing the relationship between the degree of destabilization 
caused by a Gly replacement by a given residue and the degree of clinical severity seen 
in OI. The nonlethal and lethal phenotypes are observed data, while the normal and 
nonviable are inferred by scaling the total number of each possible replacement to the 
number of observed for Ser, as described in Persikov et al. (2004). 


IX. CONCLUSIONS 


The discovery of collagen triple-helix domains in a whole range of 
proteins, including bacteria and viruses, illustrates the adaptability of this 
motif to a range of biological functions. As more human disorders are 
found to be related to mutations in the collagen triple helix, the function- 
al importance of some collagen types is clarified. These mutations provide 
a motivation for basic research on sequence-dependent alterations of 
triple-helix properties, and for investigating how such alterations may lead 
to a disease phenotype. This past decade has been marked by the deter- 
mination of high-resolution structures of collagen triple-helix peptides 
and the availability of NMR data on specifically labeled residues. These 
studies have been complemented by dynamic, thermodynamic, and 
computational analyses. In the coming years, crystal structures of hetero- 
trimers and of other ligands bound to collagen domains will be needed to 
further clarify the molecular basis of function. 


X. ABBREVIATIONS AND NOTATION 


The standard one-letter and three-letter amino acid notations are used, with hydroxypro- 
line designated as Hyp in the three-letter code and O in the one-letter code. Fluoroproline is 
designated as Flp. Abbreviations used for different proteins include MBL, mannose binding 
lectin, also known as mannose binding protein; SP-A and SP-D, surfactant apoproteins A and 
D; MSR, macrophage scavenger receptor; IBP, integrin binding peptide; Scll and Sc, 
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Streptococcus collagen-like protein 1 and 2; BclA, Bacillus collagen-like protein in anthrax. 
Peptides are indicated by the notation used in Table II, or by the single letter code for the 
residues other than Gly-Pro-Hyp triplets. 

The individual collagen types are denoted by roman numerals, with individual chain types 
indicating alpha chains. For example, al(V) indicates the al chain of type V collagen. 

The symmetry of the triple helix is designated here as 10/3 (twist angle of 36 degrees, 10- 
fold symmetry, which is also designated as 107 in crystallographic screw symmetry notation) 
and 7/2 (twist angle of 51.4 degrees, sevenfold symmetry, which is also designated as 7; in 
crystallographic screw symmetry notation). 
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ABSTRACT 


The majority of collagen in the extracellular matrix is found in a fibrillar 
form, with long slender filaments each displaying a characteristic ~67 nm 
D-repeat. Here they provide the stiff resilient part of many tissues, where 
the inherent strength of the collagen triple helix is translated through a 
number of hierarchical levels to endow that tissue with its specific mechan- 
ical properties. A number of collagen types have important structural 
roles, either comprising the core of the fibril or decorating the fibril 
surface to give enhanced functionality. The architecture of subfibrillar 
and suprafibrillar structures (such as microfibrils), lateral crystalline and 
liquid crystal ordering, interfibrillar interactions, and fibril bundles is 
described. The fibril surface is recognized as an area that contains a 
number of intimate interactions between different collagen types and 
other molecular species, especially the proteoglycans. The interplay be- 
tween molecular forms at the fibril surface is discussed in terms of 
their contribution to the regulation of fibril diameter and their role in 
interfibrillar interactions. 
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I. INTRODUCTION 


Collagen constitutes the major proteinaceous weight of the mammalian 
body, where it is commonly found as long, slender fibrillar structures that 
are most easily recognized by a 67 nm periodicity. The collagen fibril 
serves as one of the prominent scaffolding structures utilized in animals, 
where the strength of an individual ropelike collagen molecule relates 
directly to the structural integrity and strength of the tissue. Collagen 
fibrils are substantial constituents of skin, tendon, bone, ligament, cornea, 
and cartilage, where the fundamental tensile properties of the fibril are 
finely tuned to serve bespoke biomechanical, structural, and mechano- 
transductory signaling roles. Many of these properties derive from the 
structural organization within a fibril, where the organization and topolo- 
gy of the collagen molecules ensure strong intermolecular interactions. 

The presence of subfibrillar organization may point to structural levels 
of organization that are required for the successful mechanical response of 
fibrillar collagens, and may also be part of the inevitable balance between 
crystallinity and disorder within a biological polymer. The presence of 
different collagen types within a single fibril is a structural prerequisite in 
many tissues. However, the necessity for heterotypic fibrillar structures may 
point to fine tuning of the structural properties in a composite, such as fibril 
size regulation, dispersion of crystallinity, and interfibrillar communication. 
These structure-function relationships are still being resolved. 

The overall properties and morphology of a fibril are as important as its 
internal organization. For example, the surface of a fibril is a complex area 
that contains collagen molecules and a variety of proteoglycans. These 
dictate the interaction between fibrils and specify the environment of 
partner macromolecules. They are also important in restricting fibril 
growth and permitting fusion to occur. The diameter and overall slender 
tapering of collagen fibrils has significance in determining the macro- 
scopic mechanical properties of the tissues. The basis of fibrils existing 
either as a fused mass in a tissue or as discontinuous discrete structures is 
still the subject of debate. 

The purpose of this Chapter is to review the major features of collagen 
fibrils and to demonstrate the current state of understanding the way 
collagen fibrils produce functional materials. 


Il. THE COLLAGENS THAT CONSTITUTE FIBRILS 


Collagen fibrils contain a number of collagen gene products. These can 
be conveniently divided into two classes: (a) fibril-forming collagens; 
and (b) fibril-associated collagens with interrupted triple helices (FACIT 
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collagens). In terms of incorporation into fibril structure, the fibril- 
forming collagens (Types I, II, III, V, and XI) usually constitute the 
majority of molecules. Their structure is dominated by the 300 nm tri- 
ple-helical structure, though some differences in the size and complexity 
of the telopeptides (the ends of the molecule) do occur with collagen 
type. FACIT collagens are a far more heterogeneous family of molecules 
(e.g., Types IX, XII, XIV, XVI and XIX). Their additional structural 
complexity is thought to be due to the requirements for interactions 
between each fibril and the cell/matrix environment. To that end, FACITs 
usually occupy the surface of fibrils and some have a transient interaction 
with fibrils during development. 

A further subdivision of the fibril-forming collagens can be made by 
examining the patterns of occurrence of such collagens in fibrils. Types I 
and II collagens form two classes of fibrils, in which they are the major form of 
collagen present. For the purpose of this review, such fibrils will be named 
Type-Lrich fibrils and Type-Il-rich fibrils, respectively. Fibril-forming collagen 
Types HI and V are preferentially associated with Type-I-rich fibrils, whereas 
Type-I-rich fibrils frequently contain Types IX and XI collagen. Fibrils 
comprising Types II and III collagen have also been observed in cartilage 
(Young ei al., 2000a). Many fibrillar structures are therefore described as 
heterotypic. Fibrillar composition can now be detected by very sensitive 
techniques, such as infrared matrix-assisted laser desorption/ionization 
time-of-flight mass spectrometry (IR-MALDI-TOF-MS) (Dreisewerd et al., 
2004). 


A. Type I Collagen-Rich Fibrillar Structures 


Fibrillar structures that consist primarily of Type I collagen, such as 
those from rat tail tendon, appear to have distinct structural features such 
as a wide distribution of fibril diameters and a greater degree of internal 
crystallinity. It is also of importance that Type I collagen is a heterotrimeric 
structure, where the presence of the a2 chain is critical to proper fibril 
formation (McBride et al., 1997). Type I-rich fibrils in tissues such as skin, 
however, are heterotypic and contain significant amounts of Type III 
collagen, typically around 20% by mass. The length of the triple helix in 
Type II collagen is slightly longer than in Type I although, in contrast, the 
telopeptides are slightly shorter. The presence of Type III collagen is often 
associated with tissues with regulated fibril diameters. Although Type IH 
collagen can be buried within the fibril, slowly processed N-terminal 
propeptides may persist on its surface. Crosslinks between Type I and 
Type III collagen have also been identified, indicating specific interactions 
within the fibril (Henkel and Glanville, 1982). 
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Type V collagen is coassembled with Type I collagen in fibrils such as 
those in cornea, and is thought to be one of the factors responsible 
for the small, uniform fibrillar diameter (25 nm) characteristic of this 
tissue (Linsenmayer et al., 1993). The entire triple-helical domain of the 
Type V collagen molecules is believed to be buried within the fibril. 
The Type I collagen molecules are thus present at the fibril surface 
(Chanut-Delalande et al, 2001), along with the retained N-terminal do- 
mains of the Type V collagen. The latter are believed to extend outward 
through the gap zones. A significant feature of the triple-helical domain of 
Type V collagen is the high content of glycosylated hydroxylysine residues 
(10 times higher than for Type I collagen); this must have a significant 
influence on both the intermolecular and interfibrillar interactions of the 
triple-helical domain of the Type V collagen molecule (Mizuno et al, 
2001). 

Type XIV collagen contains multiple domains and is capable of inter- 
acting with collagen fibrils and other extracellular matrix components. 
Immunoelectron microscopy has shown that Type XIV collagen is fibril- 
associated with a periodicity of 67 nm, indicating specific interactions that 
may alter during tissue development. Changes in splice variant expression 
of the gene product suggest that different functional forms of Type XIV 
collagen can be present. This allows modified interactions with fibrils 
during development that facilitate the transition from growing fibril 
intermediates to mature fibrils (Young et al., 2000b). 

In the corneal stroma, Type XII collagen may be organized along the 
collagen fibrils in a uniform head-to-tail pattern (Wessel et al., 1997). From 
its similarity to Types XII and XIV collagen, Type XX is also expected to 
bind to collagen fibrils (Koch et al., 2001). 


B. Type II Collagen-Rich Fibrillar Structures 


Type II collagen forms the main scaffold for the second class of fibrils 
described here. Far less is known about the molecular structure within 
Type IL-rich than Type I-rich fibrils. This results from the composition 
heterogeneity in tissues rich in Type II collagen. The fibrils usually also 
lack crystallinity, which has limited the interpretation of X-ray fiber dia- 
grams. Also, fewer electron micrograph studies have been conducted 
(Ronziere et al., 1987, 1998). Type II collagen and its associated minor 
collagens tend to be glycosylated to a greater extent than Type I-rich 
fibrils, which may provide part of the basis for the structural differences 
that occur. Cartilage fibrils contain Type II collagen as a major constitu- 
ent, but the presence of additional components, minor collagens, and 
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noncollagenous glycoproteins is thought to be crucial for modulating 
several fibril properties. 

Collagen XI, a heterotrimeric molecule, is found predominantly in 
heterotypic cartilage fibrils, where it is incorporated into Type (Loch fibrils 
and is involved in the regulation of fibrillogenesis. The partly processed 
nonhelical N-terminal regions are important in altering fibril surface prop- 
erties (Blaschke et al., 2000). Coassembly of collagens I and XI has also 
been found in fibrils of several normal and pathologically altered tissues, 
including fibrous cartilage, bone, and osteoarthritic joints (Hansen and 
Bruckner, 2003). The fibril structures appear to be significantly different 
from fibrils composed of collagen Types II and XI. 

Type IX collagen is one of the more widely studied FACIT collagens. Its 
molecular structure contains helical features for integration into the fibril, 
as well as globular and armlike structures that allow interactions between 
fibrils. Immunochemical-labeling studies established that articular carti- 
lage fibrils are biochemically heterogeneous, as different populations of 
fibrils share Type II collagen but have distinct compositions with respect to 
Type IX collagen, thereby defining their surface properties uniquely (Hagg 
et al., 1998). Transgenic mice with mutations in a Type IX gene develop 
normally, but show degenerative changes in articular cartilage after birth. 
In addition, evidence for the importance of collagen IX in human articular 
cartilage comes from the recent finding that a mutation in one of the 
collagen IX genes causes multiple epiphyseal dysplasia (Olsen, 1997). 

Collagen XVI is a minor FACIT component of fibrillar collagens (Kassner 
et al., 2004). In cartilage, collagen XVI is a component of small heterotypic 
D-banded fibrils, mainly occurring in the territorial matrix of chondrocytes. 
Electron micrograph images of collagen XVI reveal rodlike molecules that 
harbor multiple sharp kinks. These kinks, possibly caused by noncollagen- 
ous regions, give rise to a highly flexible structure. The total length of 
individual trimeric recombinant collagen XVI molecules is about 240 nm, 
as calculated by atomic force and negative staining electron microscopy. 


Ill. FIBRILLAR MOLECULAR PACKING 


From the earlier work described here, it is clear that the collagen fibril is a 
complex structure. Many molecular species interplay to produce a fibril 
whose overall structure plays a specific role in the architecture of the tissue. 
In spite of such molecular heterogeneity, there are some central structural 
features of collagen packing that pertain more or less to all fibrillar struc- 
tures. The purpose of this section is to discuss the generic and specific 
modulation in axial and lateral packing, and show how these lead to 
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reconciliation between observations relating to molecular crystallinity and 
overall fibrillar morphology. 

The less contentious aspect of collagen molecular packing is the axial 
structure within a fibril. Fibril-forming collagens are typically 300 nm in 
length; the 67 nm density (D) step function repeat of fibrils, observed 
using many physical imaging techniques, is explained by the molecular 
stagger between molecules being 67 nm, or an integer multiple of this. 
Since the collagen molecular length is ~4.4 D, the molecular stagger leads 
in projection to regions of high and low electron density, these being the 
overlap and gap regions (Hodge and Petruska, 1963). The association 
between collagen molecules is driven by electrostatic and hydrophobic 
interactions, where a 234-amino-acid pseudoperiod observed within the 
collagen sequence of all fibril-forming types is the key to optimal electro- 
static pairings between adjacent triple helices and maximizing the contact 
between hydrophobic regions (Hofmann et al., 1980; Hulmes et al., 1973; 
Itoh et al., 1998; Ortolani et al., 2000). The structural features of axial 
packing are shown in Fig. 1. 

The axial association of heterotypic collagen molecules within a fibril is 
poorly understood. Indeed, the heterotypic axial register is relatively 
unstudied and information pertaining to interactions is mostly derived 
from crosslinking studies. This indicates a specificity of axial interaction 
between Type I collagen and Type III collagen. The interactions between 
Type I and Type V collagen are less well understood. Electron microscope 
evidence for the D-periodic interaction of Type IX collagen in a Type II- 
rich fibril was established by Vaughan et al. (1988). The basis of the 
interaction has been elaborated by biochemical studies that show Type 
II collagen forms extensive crosslinks with Type IX collagen (see, e.g., Eyre 
et al., 2004). These interactions are further stabilized by the development 
of molecular crosslinks between the collagen molecules. These occur 
between sites in the short nonhelical N and C-terminal telopeptides of 
the collagen molecules and the main chain of the helix (for a review 
of collagen crosslinking see Bailey, 2001). Studies on oim/oim versus 
control tendons indicate that the total absence of alpha 2(I) chains results 
in a decreased order of axial packing and a loss of crystalline lateral 
packing. This suggests that the nonequivalence of three chains is 
an important determinant of lateral interactions between adjacent 
molecules and may be involved in the long-range axial order in Type I 
collagen-containing tissues (McBride et al., 1997). 

Detailed analysis of X-ray diffraction data and electron microscope data 
corresponding to the axial structure of fibrillar collagens has been made. 
These studies have indicated that the exact molecular periodicity 
contained 234.2 amino acids per D repeat (Meek et al., 1979). The 
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Overlap gap 


Length of molecule 
300 nm 


Fic. 1. The axial organization of collagen molecules in a collagen fibril. The pattern 
this arrangement produces is revealed by X-ray diffraction (bottom) and unstained 
cryoelectron microscopy (top). The individual 300 nm long collagen molecules are 
axially aligned in the fibril according to the Hodge and Petruska (1963) model 
(middle), where the collagen molecule’s internal pseudo-periodicity facilitates 
staggered molecular interaction. This produces the gap-overlap step function of 
electron density that underlies the meridional series of reflections in the fiber diagram, 
and also produces the characteristic banding pattern of 67 nm seen in electron 
micrographs of collagen fibrils. 


molecular organization of the telopeptide regions has been more difficult 
to ascertain. The majority of work has been carried out on the telopeptides 
from Type I collagen. Nuclear magnetic resonance (NMR) studies of the 
isolated telopeptides in solution yielded structural information indicating 
a contracted C-terminal peptide with a possible propensity to folding 
(Otter et al., 1988). The modeling of isolated peptides using structure 
prediction models produced a variety of possible structures (see, e.g., 
Helseth and Veis, 1981; Helseth et al., 1979; Jones and Miller, 1987). Such 
results may indicate that the structural stability of the telopeptides is 
conferred in part by the triple-helical environment in which they are 
surrounded in a fibril. Modeling studies of telopeptides constrained (or 
docked) within a hexagonally packed triple-helical environment have 
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been made by Jones and Miller (1991), in part by Vitagliano et al. (1995), 
and by Malone et al. (2004). These studies have indicated that the axial 
translation per residue within the telopeptides is less than that of the triple 
helix. This is in agreement with the interpretation of X-ray and neutron 
diffraction data made by Hulmes et al. (1977, 1980), where the N-terminal 
telopeptide sequence was shown to be contracted with a reduced axial rise 
per residue than seen in the triple helix. However, the C-terminal telopep- 
tide contained a turn that causes the polypeptide chain to fold back 
on itself. X-ray diffraction studies by Bradshaw et al. (1989) also indicated 
that the position of heavy atoms labeling the tyrosine residues in the 
telopeptide required the C-terminal telopeptide to be folded. A profile 
of the electron density at a resolution of approximately 0.5 nm was 
obtained by Orgel et al. (2000). This indicated that the folding point of 
the Type I collagen ol chain was between residues 13 and 14 of the 
C-terminal telopeptide. This feature also ensured that the telopeptide 
lysine residue involved in intermolecular crosslinking is in register with 
its acceptor sidechain on an adjacent helical segment. A schematic of this 
structure for the C-terminal region of the al chain of Type I collagen is 
shown in Fig. 2. 


C telopeptide 
SE 4 aS | 
51 es V 4 mu FG 
m S1 i Y 4: P13 
End of triple- Y25 m Y 24s ||| Cem K "Rm O 


helical region 


Y25 MY24 NS | 9 BEE KT O 1 4 


Fic. 2. Conformation of the C telopeptide. Determination of the projected electron 
density profile of the 67 nm repeat by employing conventional X-ray crystallography 
methods allowed the structure of the telopeptides to be shown. Here, a drawing of the 
C-telopeptide region shows a conformation that contains a reverse turn involving 
residues Prol3/Gln14. This would bring the Tyr24 residue into approximate axial 
alignment with the Tyr4 residues, and bring Lys17 into a favorable position to form the 
lysine-hydroxylysine cross-link with Hy87 and help stabilize the microfibrillar structure. 
Adapted with permission from Orgel et al. (2000). 
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The heterotrimeric structure of Types II and II collagen, and the lack 
of sequence similarity to Type I collagen, ensures that the conformation of 
telopeptides in a Type II or Type II collagen chain is different to that in 
Type I collagen. A study of the Type II collagen telopeptide structure 
utilized electron micrograph staining patterns that were analyzed and 
interpreted in terms of amino acid sequence (Ortolani et al., 2000). 
A telopeptide model for both the N- and C-terminus was developed that 
contained molecular reversals at positions LON-12N, 12C-14C, and 17C- 
19C for N- and C-telopeptides. This indicates that the telopeptides have an 
““S-fold’”’ conformation, which can be interpreted as axial projections of a 
tridimensional conformation. In contrast, a study of the NMR solution 
structures of isolated telopeptides from Type II collagen (Liu et al., 1993) 
indicated that the Type II C-terminal telopeptide is extended. The recent 
findings that Type II N-telopeptide interacts with Annexin V (Lucic et al., 
2003) points to a novel intercellular communication, which adds to the 
need for the local structure of this telopeptide region to be resolved. 

The structures of the Type IH and V collagen telopeptides have been less 
studied. However, the NMR study of Type III telopeptides has been re- 
ported, and the 22-amino-acid C-terminal telopeptide is extended with a 
tight turn involving residues 8-11 (Liu et al, 1993). Crosslink analysis 
reveals connectivity between the C-terminal telopeptide of Type II collagen 
and the N-terminal helical region of another Type HI molecule (Henkel, 
1996). 

The telopeptides are characterized by a lack of the classical triplet 
Gly-Pro-Hyp. This characteristic signature is the most frequently occurring 
sequence in fibril-forming collagens, but it only accounts for about 10% 
of the amino acids triplets found. The significance of sequences with 
low Hyp/Pro occupancy have led to suggestions that real collagen 
sequences may contain local distortions from the accepted helical struc- 
ture formed by "model" Gly-Pro-Pro peptides (Paterlini et al, 1995). 
Simple modeling of the electron density profile of collagen defined by 
the presence of amino acids at discrete locations produces an adequate fit 
with electron micrograph data and X-ray diffraction data. However, the 
expected position of amino acids requires fine tuning in order for an 
optimal fit with the data to be obtained (Brown et al., 1997; Orgel et al., 
2000). This offers the possibility of regions of rarefaction and relative 
compression in the axial density profile. This may in part explain why 
crystallization of real collagen sequences has proven difficult. The pres- 
ence of local distortions in the structure may be induced by the fibril 
formation; these areas could correspond to regions of differing molecular 
extensibility, thereby bestowing unique mechanical properties to the fibril 
(Silver et al., 2001). 
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IV. LATERAL PACKING IN FIBRILLAR COLLAGENS 


Agreement on the basis for lateral packing of collagen molecules within 
a fibril is less well resolved than for axial packing. Some fibrillar structures, 
such as those seen in rat tail tendon, have attracted particular attention as 
they exhibit long range crystallinity (North et al., 1954; Wess et al., 1998), 
as do turkey tendon (Jesoir et al., 1981) and lamprey notochord, a Type IT 
fibrillar structure (Eikenberry ei al., 1984). Such structures have provided 
a basis for study since long-range order can provide a rich source of data, 
giving indications to a commonality in molecular packing that may 
pervade all collagen fibrillar structures. 

Detailed investigations of packing modalities within fibrils from rat tail 
tendon favor the presence of microfibrillar structural units that are 
arranged as compressed microfibrils on a triclinic unit cell lattice (Fraser 
et al., 1983; Wess et al., 1995). Here, the pentameric topology suggested by 
Smith (1968) is adhered to, while a more realistic packing density is 
achieved by distortion of a regular pentagon to a compressed structure 
that allows quasi-hexagonal lateral packing (Trus and Piez, 1980). These 
units have been revealed in more detail recently by the determination of 
the structure of a single collagen unit cell from the electron density map 
derived from X-ray diffraction studies. This was achieved using adapted 
protocols from the multiple isomorphous replacement technique of phase 
determination in macromolecular crystallography (Orgel et al., 2001). This 
study confirmed previous interpretations of the fiber diagram, where the 
main structural features were kinked collagen molecules tilted relative to 
the fibril axis. 

A feature gleaned from earlier interpretation of the quasi-hexagonal 
molecular packing scheme for collagen was that the molecules within the 
overlap region have a common direction of tilt of magnitude ~5 degrees, 
with a direction that is almost parallel with the b axis of the triclinic unit 
cell (Miller and Tochetti, 1981). In order to ensure that the contents of all 
unit cells are identical in a crystallite, there must be a rearrangement of 
molecular connectivity in the gap region. This implies that one collagen 
molecule contains several kinks, where the overall magnitude of its tilt is 
similar in both the gap and overlap regions. However, the azimuthal 
orientation of the tilt varies to ensure that each D segment passes through 
the correct topological position of the unit cell. 

A consensus of opinion from these studies points to a one-dimensional 
staggered microfibrillar structure with intermicrofibrillar crosslinks. Such 
intermicrofibrillar links are important in the hierarchical connectivity at 
the supramicrofibrillar level and also provide the basis as to why individual 
microfibrils have proven so difficult to isolate. The molecular topology of 
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compressed microfibrillar structures and three-dimensional electron den- 
sity maps are shown in Fig. 3. Evidence for a microfibrillar structure has 
also been obtained using a number of other physical characterization 
techniques. For example, electron tomographic reconstructions in dry 
cornea fibrils revealed ~4 nm microfibrillar-type structures (Holmes 
et al., 2001). Atomic force microscopy (AFM) was used to reveal micro- 
fibrils of Type I collagen arranged parallel, or inclined approximately 5 
degrees, to the fibril axis (Bascht et al., 1993). 

In spite of evidence for specificity of molecular packing indicated by 
crystallinity, the X-ray fiber diagram of fibrillar collagen also contains a 
significant amount of diffuse scatter, indicating that fibrils contain a large 
amount of static or even dynamic disorder. Fibrils that contain crystalline 
regions are also thought to contain significant levels of collagen molecules 
exhibiting liquid-like disorder, where the lower density gap region of the 
fibril structure is believed to be more disordered than the overlap region 
(Fraser et al., 1987; Wess et al., 1998). 

Indeed, the molecular packing of collagen has been likened to that of 
a liquid or liquid crystal, where only local molecular interactions are 
of significance (Fratzl et al., 1993; Hukins and Woodhead-Galloway, 
1977; Knight and Vollrath, 2002). In some fibril structures, this type of 
scatter is the only feature observed by X-ray diffraction, and the overall 
molecular packing can be described as liquid-like. These observations 
point to a variety of levels of lateral molecular organization in collagen 
fibrils, ranging from liquid-like to crystalline. 

The relationship between crystallinity, disorder, and supramolecular 
topology of crystalline packing within a fibril requires explanation. 
Organized lateral packing within fibrils results in a long-range crystalline 
structure, and this has been observed in lateral sections of fibrils studied 
using electron microscopy (Hulmes et al., 1985). A further investigation by 
Hulmes et al. (1995) attempted to assess the compatibility of the multitude 
of proposed fibrillar packing models that had emerged with electron 
micrograph and X-ray diffraction data. Models of molecular packing 
within fibrils, ranging from ordered crystalline to disordered liquid-like 
structures, were constructed and their corresponding Fourier transforms 
compared to the diffraction pattern of tendon. All of the models studied 
were found to be unable to accommodate both crystallinity and disorder 
to the appropriate extent, or else contained unrepresentative molecular 
correlations. A model developed within this particular study based on 
concentric rings of microfibrillar structures was found to have the best 
agreement with the diffraction data. The structure contained sectors of 
crystalline order interfaced by disordered grain boundaries; this allowed 
crystallites to be accommodated systematically within a fibril with a circular 
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Fic. 3. X-ray diffraction pattern of collagen fibrils in rat tail tendon. The X-ray 
diffraction pattern of rat tail tendon (top right) contains a series of intense meridional 
reflections that relate to the axial molecular organization. Three-dimensional microcrys- 
talline domains of collagen molecules produce discrete Bragg peaks in the equatorial 
(m = 0; n = 0) helix layer plane and parallel to the meridian of the diffraction pattern. The 
resolution of the diffraction data extends to approximately 1.0 nm along the equator and 
0.54 nm parallel to the meridian. The Bragg diffraction peaks from such images were used 
to determine the electron density within a unit cell adapted with permission from Orgel 
et al. (2001) (top left). This clearly shows the molecular path of individual helices passing 
through the unit cell. Here, the long c axis has been compressed by a factor of eight in 
order for the molecular path to be seen. Many different models have been proposed to 
account for the 3D packing of collagen molecules in the type I fibril. One of the earliest 
was based on a five-stranded pentagonal microfibril, where the molecular translations 
were cyclic. This was later replaced by compressed microfibrillar structures that pack 
together to produce a crystalline array. The most likely topology of interaction 
determined by Wess et al. (1998) is also shown as a dotted line set in a crystalline array 
of compressed microfibrils. The arrows indicate the position of the telopeptide 
connections between microfibrils. 
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cross-section. The organization of the crystallites, with the a axis of the 
lateral unit cell aligned in a radial direction and the baxis tangential to the 
fibril surface, may explain the evidence for curvature of the crystallites 
described by Fraser et al. (1983). The model also provided a structural 
basis for growth with the incremental deposition of molecules or micro- 
fibrils at the fibrillar surface, as suggested by the 8 nm quantal variation in 
fibril diameter observed by Parry and Craig (1979). The cross-sectional 
structure of the proposed fibrillar structure is shown in Fig. 4. 

The molecular tilt observed in tail tendon structures is about 5 degrees 
relative to the fibril axis. It has been proposed by Ottani et al. (2001) that 
such a tilt is typical of a structural class of fibril found in tissues that resist 
uniaxial stress, such as tendon, ligament, and bone. A distinct class of fibril 
are those that exhibit a molecular tilt of greater magnitude, ~18 degrees. 
This can be seen in the fibrils from tissues such as skin (Brodsky et al., 1980), 
chordae tendinae (Folkhard et al., 1987a), cornea (Marchini et al., 1986; 
Yamamoto et al., 2000a), blood vessels, nerve sheaths (Ottani et al., 2001), 
and submucosa (Cameron et al., 2002). In these tissues, fibril diameters are 
relatively uniform, typically small (less than 100 nm), and the overall 
mechanical requisite of the tissue is to resist multidirectional stresses. This 
leads to a characteristic shortening of the axial repeat from 67 to 65 nm, 
where 65 equals 67 cos 18°. Evidence from X-ray diffraction and electron 
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Fic. 4. A model for the possible relationship between crystalline and disordered 
regions within a collagen fibril. The cross-sectional model of a 50-nm diameter fibril 
shows regions of crystallinity interfaced by grain boundaries. The individual crystalline 
unit cells are shown and the gap region is represented by a darker color. The axial 
projection of a single microfibrillar unit is also shown. Based on the structures developed 
by Hulmes et al. (1995) and adapted with permission from Hulmes et al. (2002). 
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Fic. 5. Freeze etched micrographs and molecular models of collagen fibrils 
exhibiting straight and helicoidal conformations. Molecular models of the subfibrillar 
structures in parallel arrangement (top left) and fibrils with helical arrangement (top 
right). Below, the same arrangements shown by freeze-fracture micrographs of 
corresponding hydrated, unfixed rat tail tendon (left), and bovine cornea (right). 
Tail tendon consists of slender subfibrillar structures winding at an extremely shallow 
angle, while the subfibrillar structures that comprise the collagen fibrils of cornea wind 
at an angle of ~17 degrees with respect to the fibril axis. Adapted with permission from 
Ottani et al. (2001). 


microscopy shows that the molecular tilt of the molecules is believed to be 
constant throughout the fibrils (Holmes et al., 2001). This implies that such 
helicoidal arrangements must lead to the disruption of intermolecular or 
intermicrofibrillar interactions. An electron micrograph and model of 
helicoid and “straight”’ fibrils are shown in Fig. 5. The basis for helicoid 
fibril formation is not clearly defined, although differences in the helical 
lengths of heterotypic fibril-forming collagens could play a part in defining 
the helical path of molecules within a fibril (Cameron et al., 2002). These 
tissues also typically demonstrate a lack of crystallinity and a more liquid- 
like packing. It is attractive to speculate that the minor molecular species 
interrupt the crystalline packing of the major collagen species, but this 
remains to be proven. 


V. NOVEL FIBRILLAR STRUCTURES 


Up to this point, all collagen fibrillar structures discussed are of the 65 
to 67 nm axial repeat type, the most prevalent collagen form in animals. 
However, additional forms have been observed in vivo, or in pathological 
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or novel tissues. These also deserve some attention, since they show the 
further diversity of collagen fibrillar structure. Fibrous long spacing colla- 
gen (FLS) fibrils (Highberger et al., 1950) are collagen fibrils that display a 
banding with periodicity greater than the 67 nm periodicity of native 
collagen. Typical periodicities range from 150 to 250 nm. FLS collagen 
has been described both in vitro and in vivo. An electron micrograph and 
putative model of the structure is shown in Fig. 6. In particular, FLS 
collagen has been found both in pathological and in normal tissues 
(Kamiyama, 1982; Morris et al., 1978; Nakanishi et al., 1981; Slavin et al., 
1985). FLS may be the result of partial degradation of collagen reticular 
fibrils by endogenous collagenase, or possibly a derivation from the asso- 
ciation of immature collagen microfibrils and acid mucopolysaccharides 
present in tumoral tissues. FLS fibrils can be formed in vitro by addition 
of oil-acid glycoprotein to an acidified solution of monomeric collagen, 
followed by dialysis of the resulting mixture. The assembly and formation 
of FLS and the characteristic banding pattern of the fibrils are discussed in 
Paige et al. (2001). 


Fic. 6. An electron micrograph of normal banded (left) and filament long spacing 
(FLS) collagen (right). The proposed relationship of the collagen molecules with the 
extended 250 nm spacing corresponds to approximately 4D where D = 67 nm spacing. 
The exact molecular basis for the periodicity remains unresolved. Possible end-to-end 
molecular association or 4D stagger are shown below. These probably would require 
reinforcement by “scaffolding’’ molecules, such as al glycoprotein. Figure provided by 
K. Smith, University of Strathclyde. 
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Other periodic fibrillar structures with D periodicities less than 67 nm 
have been reported on a number of occasions. Fibrils exhibiting banding 
patterns with 23 + 2 nm periodicities have been described by a number 
of researchers (Porter and Pappas, 1958; Venturoni et al., 2003), and a 
9.0 nm staggered microfibrillar structure was reprecipitated from solu- 
tion (Doyle et al., 1974). Industrial processing, such as dehydrothermal 
heating, can also produce variants in fibrillar structure (Gorham etal., 1991). 


VI. SUBFIBRILLAR STRUCTURES 


There is strong evidence that certain fibril types contain subdomains of 
structure that lie between molecular/microfibrillar and fibrillar levels. 
Such nanoassemblies are of great interest, since they may reveal important 
information that relates to the overall mechanical properties of the fibrils 
and also the nucleation and growth processes of fibril formation. Recent 
AFM studies by Wen and Goh (2004) reveal fibrillar substructures, and 
work by Gutsmann et al. (2003) led to the concept that the collagen fibril is 
an inhomogeneous tubelike structure composed of a relatively hard shell 
and a softer, less dense core. It should be noted that such density differ- 
ences may be possible from the packing model proposed by Hulmes 
et al. (1995). In contrast, evidence from Franc (1993) showed central 
condensed material of 4 nm diameter in all collagen fibrils. This persisted 
even when fibrils had been swollen and partly disorganized by ethyl glycol 
dehydration. Further cytochemical characterization indicated the glyco- 
protein nature of this central compact material. The case for a generic 
fibrillar substructure therefore remains unresolved, and it is possible that 
bespoke accretion properties may be directed by fundamentally different 
subfibrillar architectures. 


VII. FIBRIL SURFACE 


The surface of the fibril provides the interface between the internal 
structural and mechanical properties of a fibril, and the rest of the extra- 
cellular matrix. The surface, therefore, is the most complex area of the 
fibril in terms of molecular heterogeneity and structure. Caution has to be 
taken in the comparison of structural and biochemical evidence from 
complementary techniques since the extraction, dehydration, and sample 
preparation can cause variation in observations of fibril surface properties 
(Raspanti et al., 1996). 

The distribution of minor collagen types as surface molecules is more 
obvious for some types than others. Fibril-forming collagens—especially 


COLLAGEN FIBRIL FORM AND FUNCTION 357 


Types HI, V, and XI—may decorate fibril surfaces with their partially 
processed N-terminal regions. A possible role of these moieties in the 
regulation of fibril diameter is discussed later. 

The FACIT collagen types have structural features that would clearly 
disrupt the internal structure of a collagen fibril, and the electron micro- 
graphs demonstrating Type IX collagen protruding from the surface of 
fibrils establish this class of molecules as modulators of surface properties. 
The N-terminal NC4 domain of Type IX collagen is a globular structure 
projecting away from the surface of the cartilage collagen fibril. Several 
interactions have been suggested for this domain, reflecting its location 
and its characteristic high isoelectric point (Pihlajamaa et al, 2004). 
Binding assays showed that the NC4 domain of Type IX collagen specifi- 
cally binds heparin at a site located in the extreme N terminus containing 
a heparin-binding consensus sequence, whereas electron microscopy sug- 
gested the presence of at least three additional heparin-binding sites on 
full-length Type IX collagen. The NC4 domain was also shown to bind 
cartilage oligomeric matrix protein. Type IX collagen appears to shield 
Type II collagen from exposure on the fibril surface. This is probably due 
to the presence of chondroitin sulphate sidechains, which will also play an 
important role in maintaining fibril spacing (Bishop, 2000). A model of 
Type IX distribution on the fibril surface proposed by Eyre et al. (2004) 
can accommodate potential interfibrillar as well as intrafibrillar links 
between the Type IX collagen molecules themselves, so providing a mech- 
anism whereby Type IX collagen can stabilize a collagen fibril network. 
Such links at the interfibrillar level are essential for the maintenance of 
structural integrity. 

In addition to a number of collagen gene products crowding the fibril 
surface, contributions are made by several noncollagenous proteins. Prin- 
cipally, these are the proteoglycans such as decorin, lumican, and fibro- 
modulin. Their role is thought to involve intrafibrillar connectivity and 
matrix-cell communication. They are also essential modulators of the 
interfacial shear between fibrils. Most proteoglycans show a nonrandom 
axial distribution, with a clear preference for the gap region within the 
D-period (see, e.g., Danielson et al., 1999; Hedlund et al., 1994; Scott and 
Thomlinson, 1998). 


VIII. Factors THAT MAY REGULATE FIBRIL GROWTH AND SIZE 


Since fibril assembly can be regarded in part as a spontaneous self- 
assembly process, the limitation of fibril size could be ascribed to a phys- 
ical equilibrium between soluble procollagen molecules and the growing 
insoluble fibril. Fibril-forming collagens are synthesized as precursor 
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procollagens, where N- and C-terminal globular propeptide extensions 
maintain solubility. The C-propeptide directs chain association during 
intracellular assembly of the pro-collagen molecule from its three constitu- 
ent alpha chains. During secretion and deposition as the extracellular 
matrix, the globular propeptides are cleaved by specific procollagen pro- 
teinases, triggering fibril formation (Prockop and Hulmes, 1994). Certain 
accretion models have postulated diffusion-limited models for simple col- 
lagen systems (Parkinson et al., 1995), and growth by this principle would 
result in discrete fibrils of relatively fixed diameter. However, tissues such as 
tendon, which are predominantly Type I collagen, seem to contain a wide 
variety of fibril dimensions. This may indicate a lack of regulation in fibril 
diameter by a simple diffusive model, and other factors may be important 
here. In the extracellular matrix, the procollagen C-propeptides ensure 
procollagen solubility, while the persistence of the N-propeptides controls 
fibril shape (Hulmes, 2002). It has also been suggested that the N-terminal 
propeptides that persist to different extents in different collagen chain types 
play a role in determining fibril size distribution. Persistence of the N- 
propeptide from the slower processing of Type III collagen or the partial 
removal of the bulky propeptide units of Types V or XI collagen may result in 
a decoration of the fibril surface with globular protein domains. This may 
prevent further accretion, and various possibilities are discussed by Chap- 
man (1989), Birk (2001), and Linsenmayer et al. (1993). Such surface 
interactions may be an important factor in why many fibrils are heterotypic 
in nature. 

Type V collagen integration in corneal tissue from both avian and 
mammalian sources points to heterotypic collagen interactions providing 
some basis for fibril diameter regulation (White et al., 1997). Indeed, the 
triple-helical structure of Type V collagen alone has been shown to be 
an important factor in regulating the fibril diameter of Type V/Type I 
heterotypic fibrils in vitro (Adachi and Hayashi, 1986). 

Proteoglycans also have an important role in the regulation of fibril 
structures. Molecules such as lumican (Chakravarti et al., 1998), decorin 
(Danielson et al., 1999), and fibromodulin (Svensson et al., 1999) coat the 
surface of fibrils in a specific manner and may restrict further growth. The 
switch mechanisms that may be responsible for the coating of fibrils after 
nucleation or a certain amount of growth are still subject to speculation 
(MacBeath et al., 1993). Knockout gene experiments in mice have indi- 
cated that the loss of proteoglycans (such as decorin) causes alteration of 
fibrillar morphology, and fibril sizes are observed to be more polydisperse. 
However, upregulation and compensatory behavior of other associated 
proteoglycans make the results more complex to interpret than simply the 
mere removal of a specific gene product (Derwin et al., 2001). 
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Other knockout experiments designed to investigate fibril size in 
SPARC-(secreted protein acidic and rich in cysteine) null skin detected 
that the fibrils were smaller and more uniform in diameter in comparison 
to those of wild-type skin (Bradshaw et al., 2003). At five months of age, the 
average fibril diameter in SPARC-null versus wild-type mouse skin was 
60.2 nm versus 87.9 nm, respectively. Analysis of the extractable collagen 
species indicated a relative increase in Type VI collagen, accompanied by a 
decrease in Type I collagen. In SPARC-null mice, possible interactions 
between Types I and VI collagen have been described, although they are 
poorly understood. 

Work by Haston et al. (2002) also indicates that acidic glycoprotein 
(AGP) influences Type II collagen fibrillogenesis, where in vitro studies 
show that low concentrations of AGP produced decreases in fibrillogenesis 
rate and fibril diameter. High concentrations produced fibrils at a rate 
and diameter dependent on fucosylation of AGP. Highly fucosylated AGP 
produced narrow fibrils, and poorly fucosylated AGP produced thicker 
fibrils. 

The interplay of heterotypic collagens, fibril-associated collagens, and 
cofibrillar macromolecules that alter the accretion properties of available 
procollagen molecules may be required to form a typical collagen fibril 
with its characteristic cylindrical central shaft section and differential tip 
shapes. Although collagen fibril formation can be mimicked in vitro as an 
acellular system, the role of cellular processes cannot be underestimated. 
The challenge remains in research to understand the interplay that exists 
between extracellular macromolecules and cellular processes that produce 
the final fibrillar structure. 


IX. DISTRIBUTION OF FIBRIL SIZES 


Fibril-forming collagens are typically 300 nm long and approximately 
1 nm wide. Such regular structures, however, form fibrillar structures of 
length and diameter that tend to vary depending on anatomical location. 
The basis for this wide distribution of fibrillar architectures is not well 
understood, but may lie in the composite nature of many fibrils. The 
effective length of fibrils has been difficult to study, since in vitro fibril 
preparation is probably not reflective of the overall length of a fibril in vivo. 
Furthermore, the lengths of the fibrils probably change with maturation. 
Disruption and dispersion of fibril structures from tissues inevitably leads 
to breakage of fibrils, and resultant intact length may not be representative 
of the overall fibrillar length. Some attempts have been made to study 
fibril lengths in electron microscope transverse section. This is a painstak- 
ing and time-consuming pursuit; therefore, the number of fibrils that can 
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be studied systematically in a tissue is low. The study by Craig et al. (1989) 
estimated the length scale of rat tail tendon fibrils to be from 0.3 mm to 
greater than 10 mm, depending heavily on developmental status. Studies 
of fibril length in development by Birk et al. (1997) failed to find both ends 
of any fibril. This indicated a possible fibril length longer than sequen- 
tial microscope sections would allow, or that the fusion of fibrils pro- 
duced a networked continuum at maturity. A comparison of fibril length 
and diameter measurements using different approaches is described in 
Redaelli et al. (2003). 

Collagen fibril diameters vary over at least two decades of nanometer 
length scales, but the reasons for such a large variation amongst tissues 
remains poorly understood. It has been suggested that the mechanical 
properties of tendon are related to the fibril diameter distribution; the 
large fibrils have a primary role in withstanding high tensile forces and the 
smaller fibrils have a special ability to resist creep (Parry and Craig, 1984). 
Fibril diameters increase with maturation of tissue, but break down at 
senescence (Parry et al., 1978). However, experiments monitoring changes 
in fibril diameter with in vivo loading of tissue have resulted in little 
consensus (Michna, 1984; Patterson-Kane et al., 1997). Tissues such as 
mature Achilles tendon contain a bimodal distribution of diameters with 
thicker fibrils of diameters about 150-250 nm and a population of thinner 
fibrils with diameters of about 50-80 nm (Svensson et al., 1999). In 
structures such as cartilage and vitreous humor, there is also a clear 
bimodal distribution of fibril diameters; however, many tissues such as 
skin and submucosa contain a more uniform distribution of diameters 
around 60 nm (Sanders and Goldstein, 2001). The rationale behind a 
discrete fibril diameter in load-bearing tissues is not immediately appar- 
ent, since a priori a distribution of fibril sizes would be thought to dissipate 
the load more evenly throughout the tissue. It has been argued that 
these tissues are usually associated with tear resistance and higher compli- 
ance than either tendon or ligament. The fibril diameter distribution in a 
tissue is possibly, therefore, a balance between overall tensile strength 
and creep resistance, the former being related to fibril size and the latter 
to the interfacial shear and thus the surface area:volume ratio (for a 
review, see Ottani et al., 2001). A special case where the optical properties 
of the tissue are paramount is that of cornea. Probably the most highly 
regulated fibrillar diameter in animals is that found in adult human 
cornea (~33 nm; Meek and Fulwood, 2001). The relatively small and 
highly constrained diameters, and the distribution on a semicrystalline 
lattice, are both essential features for the transparency of the tissue. 

Care has to be taken in comparing fibril diameter information from 
a variety of sources, since sample preparation and the ability of the 
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technique to estimate the distribution of fibril sizes may result in different 
fibril size distributions being obtained. In particular, standard embedding 
techniques for electron microscopy led to an underestimation of fibrillar 
diameter by some 33% due to sample shrinkage on preparation. Techni- 
ques such as small-angle X-ray scattering, which has been used to estimate 
fibril diameter without embedding or sectioning, tends to skew the distri- 
bution to larger fibril sizes unless the scattering power from large fibrils up 
to ~500 nm in diameter is taken into account. 


X. SUPRAFIBRILLAR ARCHITECTURES 


Although there is a strong molecular and nanoscopic driving force 
behind our understanding of the functionality of collagen fibrillar struc- 
tures, the mesoscopic and macroscopic properties of the tissue are where 
the structural integrity manifests itself. One of the differences in mechani- 
cal resistive properties between tissues such as skin and tendon is the 
feltwork nature of fibril distribution in skin. This allows resistance to strain 
to occur within a two-dimensional plane. In contrast, tendon is required 
(normally) to resist strain only along its axis. 

Cornea is a tissue where collagen fibrils of uniform diameter are regu- 
larly arranged within a matrix of proteoglycans. The fiber direction in 
cornea, however, is not uniform and consists of a succession of superposed 
layers of fibrils with different orientations. The collagen fibrils in each 
layer are parallel to one another, but are kept at a significant distance from 
one another. This is essential for the transparency of the tissue. Fibril 
distribution in the cellular cementum present a lamellar packing pattern 
that conforms to the twisted plywood principle of bone lamellation, with a 
periodic rotation of matrix fibrils resulting in an alternating lamellar 
pattern (Yamamoto et al., 2000b). In normal articular cartilage, the colla- 
gen fibrils in the superficial zone are compactly arranged into layers of 
decussating flat ribbons, mostly parallel to the artificial split lines used for 
specimen orientation (Hwang et al., 1992). 

Collagen fibrils associate in many tissues to form discrete fiber struc- 
tures. Within one collagen fiber, the fibrils are oriented not only longitu- 
dinally but also transversely and horizontally. The longitudinal fibers do 
not run only parallel but also cross each other, forming spirals, and some 
of the individual fibrils and fibril groups form spiral-type plaits (Kannus, 
2000). Such local suprafibrillar architectures present organizational fea- 
tures of lamellae, twisting or sinusoidal ‘“‘crimps’’ on a mesoscopic scale. 
Examples of these structures can be seen in Fig. 7. It has been postulated 
that a possible commonality between many different collagen-based tissues 
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Fic. 7. The collagen fibril surface, which is an area important for interfacial 
interactions. Electron micrograph image of the collagen fibril in longitudinal and 
transverse section, adapted with permission from Kuwaba et al. (2002). In both cases, the 
proteoglycan staining follows the 67 nm periodicity, where metal ion staining of the 
proteoglycan reveals the presence of proteoglycans such as decorin. The drawing shows 
the possible way in which proteoglycan GAG chains provide the interfacial shear between 
collagen fibrils (adapted with permission from Fratzl, 2003). The possible connectivity 
between two microfibrils in opposing fibrils is shown in the figure adapted with permission 
from Vesentini et al. (2004). 
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is that the suprafibrillar organization resembles the cholesteric or pre- 
cholesteric organization of liquid crystals (Giraud-Guille, 1996; Hulmes, 
2002). Such liquid crystalline behavior was shown to occur in very high 
concentrations of collagen at low pH; this behavior has also been observed 
in procollagen molecules at high concentrations in a physiological buffer 
(Martin et al, 2000). Therefore, it may be possible that liquid crystal- 
line ordering of collagen may precede condensation into true fibrillar 
forms. 

Relatively few observations have been made that relate to the specifici- 
ty of intrafibrillar interactions. In the main observation, fibrils are not 
required to have a regular axial spatial relationship that would lead to 
long-range coherence in the tissue. Mechanical models of interfibrillar 
relationships usually assume nonspecific interactions occurring through 
the proteoglycan-rich gel and hydrated milieu between fibrils. In the 
vitreous humor, thin heterotypic collagen fibrils have a coating of non- 
covalently bound macromolecules which, along with the surface features 
of the collagen fibrils themselves, probably play a fundamental role in 
maintaining gel stability. They are likely to both maintain the short-range 
spacing of vitreous collagen fibrils and to link the fibrils together to form a 
network. A collagen fibril-associated macromolecule that may contribute 
to the maintenance of short-range spacing is opticin, a leucine-rich repeat 
protein. Collagen fibrils in extracellular matrices of connective tissues 
(tendon, cornea, etc.) are also bridged and linked by the anionic glycosa- 
minoglycans (AGAGs) of the small proteoglycans (decorin, etc.). Such 
bridges maintain the collagen fibril supramolecular architecture and were 
proposed as shape modules by Scott and Thomlinson (1998). Figure 8 shows 
an electron micrograph and schematic of the fibril surface. The close 
association of proteoglycans or glycosaminoglycans (PGs/GAGs) with ma- 
turing Type VI collagens may provide a further indication to the link that 
associates D-periodic collagen fibrils via PGs/GAGs. Ruthenium red- 
stainability on the surface of D-periodic collagen fibrils was also examined, 
the results showing that these sites were also D-periodically associated 
(Watanabe et al., 1997). 

Furthermore, specific interactions of collagen fibrils with the GAG-rich 
regions of several aggrecan monomers aligned within a proteoglycan 
aggregate have been identified. The fibril could therefore serve as a 
backbone in at least some of the aggrecan complexes (Hedlund et al., 
1999). A highly specific pattern of crosslinking sites suggests that collagen 
Type IX has evolved to function as an interfibrillar network-bonding 
agent. Interfibrillar coherence has been observed in some mineralized 
tissues, such as turkey leg tendon (Landis et al., 1996), where the axial 
banding pattern of several mineralized fibrils is in register. This may result 
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Fic. 8. Suprafibrillar architectures reveal a number of features that resemble 
cholesteric and precholesteric states. (A) Fibrils at a lesion between a bone-tendon 
interface imaged by SEM show the intertwining helical nature of fibrillar interactions. 
Adapted with permission from Oguma et al. (2001). (B) Scanning electron microscope 
views of the collagen fibrillar network of the mouse pubic symphysis. Adapted with 
permission from Pinheiro et al. (2004). (C) Cross-striated collagen fibrils in a stabilized 
collagen gel after neutralization. The fibrils follow undulating patterns and mimic 
geometries described as crimp morphologies in tendons (bar = 0.5 um). Adapted with 
permission from Giraud-Gille et al. (2003). 


from the growth of mineral between fibrils, but may be an association 
required prior to interfibrillar calcification. The significance of this, and 
the specificity of interfibrillar interactions, remains a rich vein for future 
research. 
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The distribution of fibrils as fibers or fibril bundles occurs as a common 
motif in many tissues, with the fiber direction being aligned to provide the 
bespoke mechanical characteristics of the tissue. Suprabundle structures 
of collagen often provide the interface of collagen with other biopolymers, 
such as elastin and fibrillin. These elastic components complement the 
stiff collagen fibers and produce the overall functional properties of the 
tissue. Periodontal ligament contains principal fibers developed from 
aggregates of fibrils that are allowed to form during root development 
in the space between ligamental cells (Yamamoto and Wakita, 1992). In 
the case of tendon, the suprafibrillar hierarchical structures are well 
characterized. Here a group of collagen fibrils forms a collagen fiber— 
the basic unit of a tendon. Fibers are bound together by a fine sheath of 
connective tissue called the endotenon. A bunch of collagen fibers forms 
a primary fiber bundle, and a grouping of primary fiber bundles results in 
a secondary fiber bundle. Secondary fiber bundles accumulate to form 
tertiary bundles that comprise the tendon (Kannus, 2000). 


XI. MECHNICAL PROPERTIES OF COLLAGEN-RICH TISSUES 


The outstanding mechanical properties of tissues that rely on fibrillar 
collagen are due to the optimization of their structure on many hierarchi- 
cal levels. The interplay between structures already described from the 
molecular to fibrillar, and thence to interfibrillar, are critical to the overall 
mechanical properties (Fratzl, 2003). Strain in the tissue results from two 
principal mechanisms—the molecular elongation and molecular shear 
within a fibril, and the shear deformation of the proteoglycan-rich matrix 
between fibrils. The correct supramolecular organization and crosslinking 
within each fibril is therefore essential. The response of the tissue to the 
stress imposed depends very much upon the strain rate, where high strain 
rates cause elongation of the collagen molecules and initiate shearing 
effects within a fibril. Slow shear rates result in shear of the matrix between 
the fibrils, leading to creep (Sasaki et al., 1999). 

Synchrotron radiation studies of fibril behavior in tissues have been 
critical to investigations of structural alterations, since they allow transient 
structural features to be monitored in realistic time frames. Stress-strain 
experiments conducted synchronously with X-ray diffraction reveal a fiber 
indicating that only 40% of the increase in the D period before breakage— 
from 67 nm to just under 69 nm—is from the extension of the collagen 
helix. The rest of the fibril extension must result from some form of 
molecular rearrangement. This appears with the change in the relative 
intensity of the second- and third-order Bragg peaks of the meridional 
series, indicating a severe distortion in the usual gap/overlap step function 


366 WESS 


that is characteristic of fibrillar collagens (Sasaki and Oadjima, 1996). 
In situ synchrotron X-ray scattering experiments suggest that several dif- 
ferent processes could dominate, depending on the amount of strain. 
While at small strains there is a straightening of kinks in the collagen 
structure (first at the fibrillar and then at the molecular level), higher 
strains are believed to lead to molecular gliding within the fibrils and 
ultimately to a disruption of the fibril structure. Preliminary attempts to 
quantify these distortions in terms of alteration of the collagen axial 
structure met with limited success (Folkhard et al., 1987b). Here, in similar 
studies, three elementary models for molecular elongation and rearrange- 
ment of collagen within a fibril were proposed. In the first model, molec- 
ular elongation arises through an alteration of the helical pitch. In the 
second model, there is an increase in the length of the gap region. The 
third model shows that a relative slippage of laterally adjoining molecules 
occurs. 

The shortcomings of such investigations can be interpreted as: (a) the 
starting fibrillar model is of insufficient accuracy to mimic the changes 
imposed by the mechanisms of distortion; or (b) the emphasis, or even the 
entire basis, for the mechanisms of distortion is incorrect. 

It has also been noted that the strain within collagen fibrils is always 
considerably smaller than in the whole tendon. This phenomenon is still 
very poorly understood, but points toward the existence of additional 
gliding processes occurring at the interfibrillar level (Fratzl et al., 1998). 
Once again, this leads to the fibril surface and interfibrillar interactions 
having a more prominent role than generally appreciated. In turn, this 
area deserves more attention than it currently receives. Mechanisms for 
interfibrillar stress transfer almost certainly involve the surface and inter- 
fibrillar proteoglycans, where molecular entanglement and electrostatic 
interactions may provide the basis for shear resistivity. The glycosamino- 
glycans bound to decorin act like bridges between contiguous fibrils 
connecting adjacent fibrils every 64-68 nm. This architecture would sug- 
gest their possible role in providing the mechanical integrity of the tendon 
structure. Such interactions have been investigated (among others) by 
Redaelli et al. (2003). 


XII. CONCLUSIONS 


The collagen fibril is a complex structure with a fundamental D-repeat 
of approximately 65 to 67 nm. This provides a framework on how different 
molecular species interact with one another, to provide fibrillar structures 
with features suited to both their roles as biomechanical tensile materials 
and as a source of intermolecular connectivity. The ubiquity of collagen in 
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most animal tissues shows that a high degree of functionality can be 
married by modulations on a relatively stable framework. The hierarchi- 
cal organization of collagen-based materials from the molecular to the 
functional tissue allows for the intervention and interplay of a series of 
design features that ensure the “triple-helical rope” is put to best use in 
each case. 

Future studies will focus on understanding the contribution of different 
hierarchical levels to the overall performance of the tissue. This in turn 
will allow the optimum design of biomimetics and future biomaterials, and 
will provide essential information for tissue engineering. 
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ABSTRACT 


Different collagen types can vary considerably in length, molecular 
weight, chemical composition, and the way they interact with each other 
to form molecular aggregates. Collagen Types IV, VI, VIII, X, and dogfish 
egg case collagen make linear and lateral associations to form open net- 
works rather than fibers. The roles played by these network-forming col- 
lagens are diverse: they can act as support and anchorage for cells and 
tissues, serve as molecular filters, and even provide protective permeable 
barriers for developing embryos. Their functional properties are intimately 
linked to their molecular organization. This Chapter reviews what is known 
about the molecular structure of this group of collagens, describes the ways 
the molecules interact to form networks, and—despite the large variations 
in molecular size—identifies common aggregation themes. 


I. INTRODUCTION 


There are more than 25 distinct types of collagen; their amino acid 
sequences and molecular weights can vary considerably from one type to 
another. Not surprisingly, then, individual types of collagen can aggregate 
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to form different types of structures. Types IV, VI, VIII, X, and dogfish egg 
case collagen, for example, can aggregate linearly and laterally to give rise 
to open networks. These collagens are present in smaller proportions than 
Type I collagen, but their importance is underlined by the variety of 
essential roles that they play in organisms. In fact, collagen networks act 
as supporting structures for cells and tissues, serve as selective molecu- 
lar filters and barriers, function as anchorage for neighboring cells, and 
(in the case of the networks in dogfish egg cases) contain and protect 
developing embryos. 

Of the network-forming collagens, collagen Type IV is arguably the most 
important as the main component of basement membranes (see below). It 
is a relatively long molecule (about 400 nm long) that assembles irregu- 
larly when forming networks. Since structural biologists need regularity 
in the systems that they investigate, from a structural point of view 
collagen IV is very difficult to analyze. Nonetheless, by studying limited 
regions of the molecule (e.g., the N- and C-terminal regions) and the 
general modes of lateral interaction, it has been possible to arrive at a 
working model for the construction of collagen IV networks. 

Collagens Type VIII and X form hexagonal networks. They are shorter 
than collagen IV and the networks that they form are more regular. Howev- 
er, no detailed three-dimensional (3D) description exists of the networks 
that they form in terms of crystallographic symmetry and stoichiometry. 

For a long time, Type VI collagen was not generally considered as part of 
the network-forming collagen family. In the past, collagen VI tetramers 
were described to aggregate linearly to form so-called beaded filaments. It 
now appears that, as well as aggregating linearly, they are also able to 
associate laterally through their globular domains, thus forming 3D net- 
works. It is not certain whether these lateral associations can occur spon- 
taneously, or if they are mediated by other molecules. However, unlike 
collagens IV and VIII in basement membranes, collagen VI lateral net- 
works may not necessarily have physiologically important structural roles. 
It is possible that they arise as a consequence of aging or pathological 
conditions. Nevertheless, their analysis can throw light on the general 
structural patterns of collagen network formation, as well as improve 
our knowledge of the pathological or aging processes with which they 
are associated. Type VI filaments, on the other hand, play important 
physiological roles; Type VI collagen mutations have been shown to be 
linked to Bethlem myopathy (Scacheri et al., 2002) and Ullrich syndrome 
(Camacho Vanegas et al., 2001). 

Dogfish egg case collagen is relatively short (about 45 nm) and assem- 
bles in a remarkably regular network. It has proved a good model for the 
study of collagen network formation, in particular collagen VI networks. 
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Nematocyst minicollagens have also been reported. These are the shortest 
known collagens, with only 14 Gly-X-Y repeats. They are the main consti- 
tuents of the capsule wall of the nematocyst, an explosive organelle used 
by cnidaria such as hydras, jellyfish, and corals for defensive, aggressive, or 
locomotive purposes (Engel et al., 2001; Holstein et al., 1994). Although 
exhaustive structural work has not yet been carried out, it is possible that 
these collagens form networks by crosslinking linearly through cysteine- 
rich globular domains (Ozbek et al., 2002; Pokidysheva et al., 2004; Skaer 
and Picken, 1965). 

This Chapter contains a description of the main network-forming col- 
lagens. It highlights the main structural characteristics of each individ- 
ual collagen that allow network formation, and ends by discussing the 
underlying structural commonalities between different network collagen 


types. 


II. NETWORK-FORMING COLLAGENS IN DETAIL 


A. Type IV Collagen 


Type IV collagen is a fundamental component of basement membranes 
(Fig. 1A). Basement membranes represent the portion of extracellular 
matrix that remains in direct contact with its formative cells. Basement 
membranes play an important role in cell adhesion, growth and differen- 
tiation, tissue repair, molecular ultrafiltration, cancer cell invasion, and 
metastasis. 

Type IV collagen is composed of three 400-nm-long polypeptide chains. 
The triple-helical domains of the polypeptide chains are often interrupted 
(Fig. 1B). In humans, for example, the al chain has 21 interruptions of 
the Gly-X-Y triplet sequence, and in the a2 chain there are 23 interrup- 
tions. Overlaps between these Gly-X-Y discontinuities produce 26 irregu- 
larly spaced interruptions, resulting in a collagen that is flexible (Brazel 
et al., 1988). The invariance in location of many of these interruptions 
among different species suggests a functional role (Timpl, 1989), but few 
functions have yet been directly demonstrated for the interruptions. From 
an amino acid sequence analysis of collagen VI (Knupp and Squire, 2001), 
it is apparent that interruptions of the Gly-X-Y repeat in this type of 
collagen reduce the stiffness of the molecule and help supercoiling. 
Gly-X-Y interruptions may play a similar role in collagen IV. It also appears 
likely that the flexibility generated by the interruptions is essential to allow 
the formation of a network rather than a fiber (Yurchenco and Ruben, 
1987). 
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Fic. 1. (A) Metal-shadowed replica of partially purified glomerular basement 
membrane. The arrowhead points to a collagen IV network. The black arrow indicates 
a Type I collagen fiber. (Micrograph courtesy of Dr. M. Chew.) (B) Diagram of a collagen 
IV molecule. Along the triple-helical part (represented by gray rectangles), numerous 
interruptions of the Gly-X-Y amino acid repeat are present. Linear polymerization 
is observed through (C) C-terminal interactions and (D) N-terminal interactions. 
(E) Lateral interactions are also observed with apparent molecular supercoiling. 
(F) Collagen IV is able to form networks by linear and lateral associations. 
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Collagen IV ol and a2 chains are found in all mammalian basement 
membranes (Borza et al., 2001; Sado et al., 1998). Variant Type IV collagen 
a3, a4, a5, and a6 chains have been identified (Butkowsky et al., 1987; 
Oohashi ei al., 1994) and are present in a more restricted tissue distribu- 
tion. The a3, a4, and a5 chains are found in the renal glomerular 
basement membrane and pulmonary alveolar basement membrane. The 
a5 and a6 chains are found in smooth muscle, aorta, bladder, mammalian 
duct and lobule, and epidermis basement membranes. The a3, a4, and a5 
chains are thought to be responsible, when defective, for Alport syndrome 
(Sado et al., 1998). There is evidence that collagen IV molecules in 
basement membranes exist mainly in three forms: [a1(IV) ]o[a2(IV)], 
[a3 (IV) ][a4(IV)][a5(IV)], and [a5(IV) ]o[a6(IV)] (Borza et al., 2001; 
Sado et al., 1998), although other combinations have been suggested 
(Kahsai et al., 1997; Kalluri and Cosgroves, 2000). 

Type IV collagen can assemble into a stable three-dimensional basement 
membrane network via three types of interactions. In the first type, pairs of 
C-terminal globules join together to form dimers (Fig. 1C). Studies of the 
[al (IV) ]o[a2(IV)] form suggested that the dimeric globular domain is for- 
med by disulphide exchange between corresponding cystines of the two 
monomeric domains (Siebold et al., 1988). It was suggested that monomers 
would dimerize intracellularly through regulation of the redox conditions, 
which may represent the first step of higher-order assembly. This assembly 
step appeared to be driven by interactions of cysteines present in the 
globular sequences which form the dimers by disulphide exchanges in the 
al(IV) globular domains (Yurchenco and O’Rear, 1993). Recent results 
from crystallographic experiments did not confirm this hypothesis, but 
suggested that C-terminal interactions may be stabilized by a putative cova- 
lent Met-Lys crosslink (Than ei al., 2002), mediated by metal ions, or that 
they are the result of posttranslational modifications (Vanacore et al., 2004). 

It has been proposed that the three different forms of collagen IV arising 
from the combination of the six a-chains interact at the C-terminals in a 
restricted way: [a1 (IV) ]2[@2(IV) ] associates with another [a1 (IV) ]o[a2(IV) ] 
or with a [a5 (IV) Jo[a6(IV)]; and [a3 (IV) ] [a4(IV) ] [a5 (IV) ] associates only 
with another [a3(IV) ][a4(IV)][a5(IV)] (Borza et al., 2001; Boutaud et al., 
2000; Gunwar et al., 2001; Sundaramoorthy et al., 2002). The C-terminus 
would play a fundamental role in the assembly of chain-specific networks 
during both the initial alignment and selection of a-chains for molec- 
ular assembly, and during the selection and connection of molecules for 
network assembly (Sundaramoorthy et al., 2002). Investigations of the 
[al (IV) ]o[a2(IV)] form indicated that, in the second kind of interaction, 
four N-terminal ends bind to each other to form a 28-nm end-overlapped 
domain known as the 7S domain (Fig. 1D). Assembly proceeds through 
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intermediates made of two or three antiparallel monomers held together 
through cooperative noncovalent interactions. A hydrophobic region on 
each N-terminal end has been identified from the primary sequence. The 
maximization of the hydrophobic contacts of the N-termini from four anti- 
parallel molecules explains the antiparallel interactions with correct axial and 
azimuthal orientation. The complex is limited to tetramer size by the orienta- 
tion arising from these interactions; cysteine and lysine/hydroxylysine resi- 
dues are placed in the correct positions on the corresponding chains to form 
disulfide and nonreducible crosslinks (Siebold et al., 1987). The above two 
interactions can be considered the basis for network formation. 

In the third kind of interaction, Type IV collagen dimers self-interact 
through lateral (side-by-side) associations (Fig. 1E; Timpl and Brown, 
1996; Yurchenco and Furthmayr, 1984; Yurchenco and Schittny, 1990). 
This interaction was first identified in vitro where assemblies showing 
extensive irregular networks were seen to form. The irregular polygonal 
geometry of the network suggests that collagen IV associates in more than 
one way, thus creating different spatial relationships between molecules. 
The existence of laterally associated networks (Fig. 1F) has been con- 
firmed in tissue basement membranes of the human amnion (Timpl 
and Brown, 1996). In the electron microscope, these collagenous networks 
appeared as an extensive irregular polygonal network with three- to five- 
arm branch points, and with strands 2.5 to 7 nm in diameter. The average 
distance between branch points was about 45 + 30 nm; integral globular 
domains, the same shape and size as those of purified collagens dimers, 
were also seen. Lateral associations were seen between single collagen IV 
filaments, with the formation of branching strands of variable diameter. 
Unfortunately, the localization of the 7S domains was difficult in both the 
reconstructed polymers and in basement membranes, possibly because of 
superimposed lateral associations. The presence of supramolecular helices 
of collagen IV filaments was seen in amniotic networks. The similarity 
between reconstituted and tissue basement membrane collagen networks 
suggests that the information for assembly is encoded in the collagen 
molecules themselves. The covalent bonding of mammalian collagenous 
networks is formed at the N- and C-terminal regions, but the loops formed 
by these ends would be expected to irreversibly entrap and stabilize 
helically wrapped, laterally associated filaments. 


B. Type VI Collagen 


Collagen VI is a rodlike molecule about 105-nm long with two glo- 
bular regions at the extremities (Fig. 2A; Furthmayr et al., 1983). The 
globular regions contain several domains homologous to von Willebrand 
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Fic. 2. (A) Diagram of a collagen VI molecule. Collagen VI is a 105 nm long 
molecule with massive globular domains at the extremities. (B) Two molecules can 
interact laterally with a 30 nm axial shift to form dimers. Intertwining of the monomers 
in the adjacent regions is seen. (C) Dimers join laterally to form tetramers. (D) Cleaved 
globular domains join linearly, N-termini with C-termini and vice versa, but homotypic 
N- or C-terminal associations are not seen. (E) Subdomain structure of the three Type 
VI collagen chain N- and C-terminal globular regions (after Baldock et al., 2003). 


factor A-domains (Fig. 2E; Baldock et al., 2003). Collagen VI occurs mainly 
as a heterotrimer made of three different a-chains, but there is evidence 
for the existence of less stable assemblies comprising either a3(VI) or 
al(VI)/a2(VI) chains. All three a-chains have triple-helical sequences of 
similar length, but the a1(VI) and a2(VI) chains are relatively small, with 
only two A domains at each end. The a3(VI) chain is much larger because 
of a prominent N-terminal region containing 10 A domains (Fig. 2E) and 
a C-terminus containing a proline-rich repeat, a fibronectin-like repeat, 
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and a Kunitz protease inhibitor (Baldock et al., 2003). Type VI collagen 
microfibrils are stabilized by intra- and intermolecular disulphide bonds 
(Ayad et al., 1994). Monomers are assembled intracellularly, first into 
antiparallel dimers and then into tetramers (Fig. 2B, C; Ayad et al., 
1994). Purified molecules are also seen to aggregate as antiparallel dimers. 
The monomers associate in an antiparallel fashion with a 30-nm axial shift. 
Dimers consist of an inner thick rodlike region 75-nm long, in which the 
two monomers are seen to intertwine four or five times, with two shorter 
(30 nm) and thinner rodlike segments emerging axially from the inner 
region (Fig. 2B; Furthmayr et al., 1983). The sequences of the triple-helical 
parts of the Type VI collagen chains have been analyzed in detail, and it 
has been suggested (Knupp and Squire, 2001) that the three chains can 
twist into a segmented supercoil, details of which will be given below in 
Section IH. 

The globular ends of the two monomers are positioned at the ex- 
tremities of the inner region and outer segments. Dimers can associate 
side-by-side with their 30-nm long outer segments to form tetramers 
(Fig. 2C; Furthmayr et al., 1983; Von Der Mark et al., 1984). The formation 
of dimers occurs also through the interaction of the metal ion-dependent 
adhesion site in the a2C2 A-domains in the C-terminal with a GER 
sequence found in the a2(VI) chain (Ball et al., 2003). Experiments 
carried out on cleaved N- and C-globular domains showed that they join 
linearly, side-by-side, one type connected to the other (Fig. 2D; Kuo et al., 
1995). In addition, the tetramers are seen to interact to form long chains 
(beaded filaments). Dimers, tetramers, and polymeric chains all have 
30-75-30 nm spacing between the globular ends (Furthmayr et al., 1983; 
Von Der Mark et al., 1984; Wu et al., 1987). Collagen VI is found to bind 
to hyaluronan (Kielty et al., 1992), biglycan and decorin (Wiberg et al., 
2001), fibrillin (Ueda and Yue, 2003), and other matrix constituents. 

Because of the variety of ways in which collagen VI molecules can 
interact with each other and to several matrix constituents, they are good 
candidates to form networks. Lateral association of beaded filaments 
(likely to be collagen VI filaments) with their globular domains in register 
were reported as early as the 1960s (Luse, 1960) in association with brain 
tumors. More recently, more compact networks of collagen VI were seen in 
the eyes of people suffering from full-thickness macular holes (Fig. 3A, B; 
Knupp ei al., 2000; Reale et al., 2001), from age-related macular degenera- 
tion (Fig. 3C, D; Knupp et al., 2002a), and from Sorsby’s fundus dystrophy 
(Fig. 3E, F; Knupp et al., 2002b). Although these ocular structures can be 
easily interpreted in terms of collagen VI tetramers, they differ in detail 
depending on the particular illness with which they are associated. Fig. 3A 
and B illustrates a typical appearance of the collagen structure found in 
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Fic. 3. (A) Collagen VI network in the vitreous of a patient suffering from full- 
thickness macular holes. Transverse bands 30 nm apart (double arrows) repeat axially 
with a 100 nm periodicity. Axial filaments run through the bands (chevron). (Bar 
represents 10 nm and applies to all parts of the figure). (B) Fourier-filtered image of 
(A). (C) Collagen VI network in the retina of a patient suffering from age-related 
macular degeneration. In addition to the structural characteristics of the aggregates 
described in (A), extra transverse bands of protein density run between the main pairs 
of bands (single arrows). (D) Fourier-filtered image of (C). (E) Collagen VI network in 
the retina of a patient suffering from Sorsby’s fundus dystrophy. Here the transverse 
double bands are fused into a single broad band (double arrows). (F) Fourier-filtered 
image of (E). (Figure from Knupp and Squire, 2003.) 


the vitreous of a patient suffering from full-thickness macular holes 
(Knupp et al., 2000). Pairs of transverse bands, 30 nm apart (arrows), 
are repeated axially every ~100 nm. These bands arise from the globular 
domains at the extremities of the collagen VI tetramer outer segments. 
Filaments of protein density (chevron) run axially through the pairs of 
bands. In some views, their lateral periodicity is about 25 nm in the bigger 
gap between the transverse bands, and half that much in the smaller gap 
between the transverse bands. The filaments are staggered laterally by a 
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half-unit cell when traversing pairs of bands. Cross-sectional views of this 
structure showed dots of protein density arranged on a body-centered 
square array with 25 nm sides. 

The model proposed for the collagen VI assembly in these structures 
(Fig. 4A) consists of collagen VI tetramers (or pairs of tetramers) inter- 
acting with four other tetramers (octamers) via their globular domains. 
These globular interactions involved C-termini interacting with N-termini 
and vice versa. In doing so, the outer segments of two collagen VI tetra- 
mers belonging to two different levels lie side by side, and give rise to the 
axial filaments seen in the smaller gap between pairs of transverse bands. 
In cross-section, the tetramer inner segments give rise to the axial fila- 
ments in the bigger gap between pairs of bands. Tetramers belonging to 
one level lie on a square lattice with 25 nm sides. Those belonging to the 
next level up (or down) also lie on a square lattice having the same 
dimensions, but are shifted laterally by a half-unit cell so that the tetramers 
are positioned in the center of the square lattice in the adjacent levels. 
This conformation accounts for both the lateral shift of the axial filaments 
in the bigger gaps between bands, and the body-centered square lattice 
seen in transverse views. A similar arrangement can be envisaged for the 
collagen VI assemblies associated with age-related macular degeneration 
(Fig. 3C, D). 

In addition to the transverse bands of protein density seen in the aggre- 
gates described above, some of the assemblies associated with age-related 
macular degeneration present an extra pair of bands situated between and 
equidistant from the main pairs of bands (Fig. 3C, D, single arrows). The 
profiles of the main and subsidiary pairs of bands are the same, although 
the subsidiary pair is less electron dense. Models were developed to account 
for the subsidiary pairs of bands (Fig. 4B), and the preferred structure 
involved the presence of collagen VI dimers alongside the collagen 
VI tetramers. The dimers are required to be shifted axially by about 
50 nm, and their globular domains would give rise to the secondary set of 
pairs in projection. It is likely that the dimer/tetramer association is 
mediated by retinal components. Extra material interacting with the tetra- 
mer outer segments, but not with those of the dimers, can account for the 
presence of additional mass that makes the main pairs of bands more 
electron dense. 

In the collagen VI aggregates associated with Sorsby’s fundus dystrophy, 
extra material—analogous to that described above, and tentatively identi- 
fied as tissue inhibitor metalloproteinase 3 (TIMP3; abundant and defec- 
tive in people suffering from this disease)—was proposed to bind to the 
outer segment of tetramers. It would thus participate in the further 
aggregation of collagen VI tetramers (Fig. 3E, F; Fig. 4C). 
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Fic. 4. Models for the collagen VI tetramer packing superimposed on one- 
dimensional averages of the electron densities from the structures described in 
Fig. 3A, C, and D, with a summary of the possible collagen VI structural interactions that 
form the building blocks that make networks. Collagen VI tetramers are represented in 
red. N- and C-termini are represented in blue and orange. Dimers are in light blue. 
(A) Left, model for the assembly associated with full-thickness macular holes; the 
packing occurs through the globular domain interactions, N-terminals interacting with 
the C-terminals and vice versa. Right, extra material (yellow bars) binding to the 
collagen VI tetramer outer segments could increase the electron density in this region. 
(B) Model for the assembly associated with age-related macular degeneration; the extra 
pairs of bands may arise from globular proteins binding to the tetramer triple helices 
(left) or more likely from the globular domains of dimers associating laterally with the 
tetramers (right). (C) Model for the assembly associated with Sorsby’s fundus 
dystrophy. TIMP3 molecules (represented by blue circles) associate with the tetramer 
outer segments and fill the space between the bands. (D) Collagen VI tetramer. (E) As 
in (D), but with extra material binding to the tetramer outer segments (yellow bars). 
(F) As in (E), but with globular proteins that may account for the formation of extra 
bands of protein density. (G) As in (D), but with TIMP3 associating with the tetramer 
outer segments. It may well be that TIMP3 is the extra material represented in 
(E). (H) Collagen VI dimers. (I) Linear interaction of two dimers through the dimer 
outer segments to form a chain. (J) Lateral associations of chains of dimers with 
tetramers. Extra material associating with the tetramer outer segments may explain 
the greater electron density seen in correspondence of the main pairs of bands. (Figure 
from Knupp and Squire, 2003). 
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The different aspects that collagen VI networks present in the eyes of 
patients suffering from different ocular diseases suggest that collagen VI 
can interact with a variety of protein and retinal components (Fig. 4D). 
Recent immunolabeling experiments of collagen VI networks in the tra- 
becular lamellae of the corneoscleral meshwork found that these networks 
contain, in addition to collagen VI, both fibrillin and decorin (Ueda and 
Yue, 2003). In vitro experiments of purified collagen VI molecules showed 
that, in the presence of byglycan, they can assemble into hexagonal 
networks (Wiberg et al., 2001). The ability of collagen VI to interact with 
other matrix molecules is shared with collagen IV in the basement mem- 
brane, which is also able to bind to a number of basement membrane 
components. 


C. Type VIII Collagen 


Type VII collagen is found in Descemet’s membrane, the basement 
membrane that separates corneal endothelial cells from the stroma. In 
addition, it is also synthesized by vascular endothelial cells, and by epithelial 
and mesenchymal cells of other tissues (Ayad et al., 1994). Two alpha chains, 
al(VOI) and aœ2(VIII), have been seen and it has been suggested that, in 
Descemet’s membrane, the molecule exists in the [a1 (VII) ]o[a2(VIID ] 
form (Furthmayr ef al., 1983; Shuttleworth, 1997). However, in vitro studies 
showed that homotrimers made of either three a1 (VIII) or three a2(VII) 
chains can be formed and are stable. During the same experiments, the 
[al (VIII) ] [a2 (VOI) ]oe form was also created (Illidge et al., 2001; Sutmuller 
et al., 1997). The variation in the chain composition of type VIII collagen 
may have subtle functional differences in tissues (Shuttleworth, 1997). The 
triple-helical domains of the al(VOI) and a2(VIII) chains contain eight 
imperfections in the Gly-X-Y repeat. Early biosynthetic studies showed that 
secretion of type VII collagen was independent of the requirement for 
prolyl hydroxylation, in contrast to the fibrillar collagens. In the micro- 
scope, metal-shadowed collagen VIII molecules from cultured rabbit cor- 
neal endothelial cells showed rodlike structures of length 132 (+5) nm with 
a large globular domain at one end (noncollagenous domain 1; NC1) and 
a smaller one at the other (NC2; Shuttleworth, 1997). The C-terminal NC1 
domain crystallizes as a trimer, which is thought to help to form or stabilize 
triple helix formation in the molecule (Kvansakul et al., 2003). The crystal 
structure also reveals three surface strips of partially solvent-exposed hydro- 
phobic residues, which may play a significant role in the formation of the 
hexagonal supramolecular assemblies described below. 

The organization of type VIII collagen that has been studied most 
closely is that in Descemet’s membrane, where it forms a hexagonal lattice 
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with electron-dense nodes at the vertices of the hexagons, which have 160 
(+15) nm sides (Shuttleworth, 1997). The hexagonal lattice structure of 
Descemet’s membrane presumably represents a structural solution to the 
problem of creating a matrix that can resist compression and maintain an 
open porous structure (Ninomiya et al., 1990). 

Recombinant [al(VIII)]3 and [a2(VIH)]s molecules form highly 
ordered supramolecular assemblies. These assemblies may be formed by 
four triple-helical collagen VIII molecules that come together to form a 
tetrahedron via the hydrophobic patches on their C-termini. It has also 
been suggested that these tetrahedral structures may further associate to 
form hexagonal lattices, with the N-terminals of individual molecules 
interacting with either the N-terminals or with the triple-helical portion 
of molecules in other tetrahedrons (Stephan et al., 2004). 

Collagen VIII appears to be involved in differentiation and tissue re- 
modeling. It appears to be secreted by rapidly proliferating cells and can 
be found in basement membranes. It was suggested that Collagen VIII acts 
as a bridge between different types of matrix molecules (Ricard-Blum et al., 
2000; Shuttleworth, 1997; Sutmuller et al., 1997), and it has been found to 
stimulate smooth muscle cell migration and matrix metalloproteinase 
synthesis (Hou et al., 2000). Collagen VIII is present in brain, placenta, 
heart, lung, and thymus of embryonic and neonatal murine tissue. It is 
also seen in vascular tissue, arterioles, and venules in muscle, heart, 
kidney, spleen, liver, lung, tendons, and cartilage matrix (Ricard-Blum 
et al., 2000). 


D. Type X Collagen 


Type X collagen is a short-chain collagen that is regulated in time and 
location during fetal development. It is synthesized by hypertrophic chon- 
drocytes during enchondral bone formation and, since the extracellular 
matrix surrounding chondrocytes mineralizes to be replaced by bone 
marrow and bone, it was suggested that collagen X may be associated with 
the mineralization process (Ayad et al, 1994; Sutmuller et al., 1997). 
Collagen X has been shown to be able to interact with chondrocytes 
and other cells, primarily via their «2/1 integrins (Luckman et al., 2003). 

Type X collagen is made of three al (X) chains. Its triple helix is about 
132 nm in length, with a small N-terminal domain and a large C-terminal 
globular domain. As in the case of the collagen VIII NC1 domain, which 
has a very similar sequence (Kvansakul et al., 2003), the crystal structure 
of the Type X Cterminal domain reveals on its surface three strips 
containing eight partially aromatic residues that, because of their apolar 
nature, are probably important for supramolecular aggregation (Bogin 


388 KNUPP AND SQUIRE 


et al., 2002). Also like collagen VIII, the triple-helical domains of the a1 (X) 
chain contain eight imperfections in the Gly-X-Y repeat. The amino acid 
sequences of Collagen X and Collagen VIII a-chains are, in fact, so similar 
that hybrid molecules combining shortened al(VIII) chains and al(X) 
chains have been created (Illidge et al., 2001). 

Type X collagen seems to be deposited in the cartilage matrix without 
preprocessing (Ayad et al., 1994). In vitro, Type X monomers are seen to 
aggregate and to form multimeric clusters arranged as a lattice. The lattice 
consists of a hexagonal array of nodules, presumably arising from the 
globular domains, interconnected by a filamentous network formed by 
the triple-helical parts of the collagen X molecules. The average distance 
between two nodules within the hexagonal lattice is about 100 (£15) nm, 
a distance shorter than the measured length of 130 nm of the triple-helical 
domain of a Type X collagen molecule (Kwan et al., 1991). It was suggested 
that the short nodule-to-nodule distance may be in part because of the 
formation of superhelical structures among the adjacent helical domains. 
If this is the case, the mechanisms giving rise to supercoiling must be 
different from that proposed for the collagen VI segmented supercoil 
(Knupp and Squire, 2001; see below). In collagen VI, supercoiling is 
driven by patches of hydrophobic amino acids coiling around the collagen 
triple helix. This is a direct consequence of the heterotrimeric nature of 
collagen VI, of which there is no evidence in collagen X since only one 
al(X) chain has been discovered so far. Lateral interactions between 
helical domains within individual multimeric clusters were also observed 
(Kwan et al., 1991). 


E. Dogfish Egg Case Collagen 


Evidence for the collagenous nature of the dogfish egg case came from 
X-ray diffraction studies that indicated the existence of a 2.9 A meridional 
arc typical of collagens, the thermal shrinkage characteristic curve 
(S-shaped with a mean half shrinkage temperature of 78 °C), and amino 
acid composition analysis showing that glycines accounted for about 16% 
of the amino acid residues (Knight and Hunt 1974; Rousaouen et al., 
1976). The dogfish egg case collagen was purified and observed in the 
electron microscope after metal shadowing. It appears to be about 45 nm 
long with a large globular domain at one end (about 4 nm in diameter) 
and a much smaller one at the other (Knight et al., 1996). The short 
molecular length of the dogfish egg case collagen is thought to contribute 
to its ability to form liquid crystalline phases and crystalline networks. 

Luong et al. (1998) sequenced, at least partially, the dogfish egg case 
collagen protein extracted from the nidamental gland and from newly 
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formed eggs. From their study, the dogfish egg collagen appears to form 
from two peptides in which repeated Gly-X-Y regions occur. The collage- 
nous peptides, 34 and 36 kDa respectively, shared the same N-terminal 
sequence and might be two variants with different a-chain composi- 
tions. Homologies in sequence with Type IV and Type X collagen were 
recognized. 

Before the formation of the egg case wall, the dogfish egg case collagen 
is stored and secreted by specialized glands (Knight et al., 1996). The 
collagen assembles within the Golgi apparatus and migrates through the 
cytoplasm within storage granules. These undergo maturational changes 
as the collagen molecule aggregation state varies through different liquid 
crystalline phases. Finally, it flows through secretory ducts to the spinner- 
ets, where it starts to form fibrils (Knight et al., 1993, 1996). The liquid 
crystalline changes of the collagen molecules in the storage granules are 
thought to be driven by pH variations generated by proton pumps and by 
changes in the water content (Feng and Knight, 1994; Knight et al., 1996). 
These phases have been classified by Knight ef al. (1993) as a poorly 
ordered micellar phase (Phase I), a transversely banded lamellar phase 
(Phase II), a cholesteric mesophase without transverse banding (Phase 
III), a hexagonal-columnar phase (Phase IV), a second poorly oriented 
micellar phase (Phase V), and a highly ordered arrangement found in the 
mature egg case (Phase VI). Jn vitro studies carried out at pH 8, which 
presumably is close to the physiological pH of the egg case, generated 
granules showing material containing only three phases: Phase IV (hexag- 
onal columnar; Fig. 5A), Phase VI (the mature egg arrangement; Fig. 6C), 
and a new arrangement with centrosymmetrically banded fibrillar material 
showing a period of 17.5 nm (Phase VII; Feng and Knight, 1994). 

A three-dimensional reconstruction was carried out for two of these 
different molecular arrangements: the hexagonal columnar phase (Phase 
IV; Knupp et al., 1999) and the arrangement in the mature egg case 
(Knupp et al. 1996, 1998). In the hexagonal columnar phase (Fig. 5A), 
the collagen presents good conformational order if seen in transverse 
view, but poorer order in longitudinal views. The hexagons seen in cross- 
section (Fig. 5B) measure about 20 nm on each side. The corners appear 
more dense than the rest (chevron) and an area of protein density occurs 
in the middle of the hexagons (arrowhead). Strands of protein density are 
seen joining the corners and the middle of each hexagon. The recon- 
structed images show axial columns of protein density lying regularly on 
the vertices of hexagonal cells (Fig. 5C, D). These columns are connected 
to the three nearest neighbors by sheets of protein. Distinguishable pro- 
tein strands within sheets show a preferred molecular direction of about 
40 to 50 degrees to the long axis (Fig. 5C, arrows). The packing scheme 
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Fic. 5. Hexagonal columnar liquid crystals made of dogfish egg case collagen. 
(A) Typical view of a storage granule. (B) Fourier-filtered image of a transverse view. 
The corner of each hexagon appears relatively dense and massive (chevron), a lower 
protein density is seen in the center of each hexagon (arrowhead). Spikes of protein 
density irradiate irregularly from the center of each hexagon (arrows). (C) 
Longitudinal view of the three-dimensional reconstruction of the hexagonal columnar 
collagen arrangement. Protein strands show a preferred molecular direction of about 
40 to 50 degrees to the long axis. (D) Transverse view. Axial columns of protein density 
lie on the vertices of hexagonal cells and are connected by irregular sheets of proteins. 
(E) Basic unit of the collagen arrangement in the hexagonal phase of dogfish egg case 
collagen. The heads of three collagen molecules join with the tails of three other 
molecules (arrows). Likely interruptions of the triple helices (chevrons) may endow the 
molecules with the necessary flexibility to form networks. (F) Transverse view and (G) 
longitudinal view of the molecular arrangement proposed in the hexagonal columnar 
phase. One basic unit is highlighted in dark gray. Occasionally, molecules can aggregate 
differently (light gray molecules) and give rise to the protein density seen in the middle 
of the hexagons (Figure from Knupp and Squire, 2003). 
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Fic. 6. (A) Typical view of the collagen arrangement within the dogfish egg case. 
(Bar, 100 nm.) (B) Fourier synthesis of a longitudinal view with collagen octamers 
superimposed. One such octamer is color-coded to highlight single monomers (red and 
yellow molecules) and their axial shift. A dimer is drawn in blue. (Bar, 10 nm) (C) 
Three-dimensional reconstruction of the collagen arrangement within the egg case. 
(Bar, 10 nm). (D) Two-dimensional diagram of the collagen arrangement in the 
dogfish egg case. Two monomers are highlighted in red and yellow, and their laterally 
interacting parts in orange. Monomers associate laterally to form octamers, with four 
molecules axially shifted by 15 nm. Interactions between different octamers occur via 
their globular domains. (E) As in (A), but in three dimensions. (Figure from Knupp 
and Squire, 2003). 


proposed for the collagen molecules in the hexagonal columnar phase 
was based on the interaction of the three a-chains forming the molecule 
(Fig. 5E-G, dark gray molecules in Fand G). The heads of three molecules 
join together with the tails of three other molecules presumably through 
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hydrophobic interactions (Fig. 5E, arrows). Likely interruptions of the 
collagen Gly-X-Y sequence allow the molecule to kink (chevrons) and 
assume a 40 to 50 degree tilt to the long axis. Occasionally some molecules 
may aggregate differently and cross the hexagonal unit cell, giving rise to 
the areas of protein density in the middle of the hexagons (Fig. 5F, G, gray 
molecules). 

The arrangement of the molecules in the mature egg case (Fig. 6A) is very 
different from that described above. In longitudinal sections, pairs of bands 
of protein density about 15 nm apart are repeated axially approximately 
every 45 nm (Fig. 6B, C). Filaments of protein density run axially through 
the bands. In some views, the filaments that are separated laterally by 10 nm 
in the big gap between pairs of bands are staggered laterally by 5 nm when 
traversing the pairs of bands. In the small gap between bands, the lateral 
periodicity of the filaments is 5 nm. Cross-sectional views of the egg case 
show that the filaments are arranged on a body-centered square lattice with 
10 nm sides. Consideration of the unit cell symmetry suggests that each 
filament in the big gap between bands is made of eight molecules, with 
four molecules axially shifted by ~15 nm with respect to the other four 
(Fig. 6D, E). The globular domains at the extremities of the octamers 
interact with other octamer globular domains and, in projection, give 
rise to the transverse bands. An axial overlap of the octamers gives rise to 
the doubled lateral periodicity of the filaments in the small gap between the 
bands in longituginal view. In cross-section, groups of octamers are 
arranged on a square lattice with 10 nm sides; octamers belonging to 
different levels are laterally shifted, so that those at one axial level are in 
the middle of the square lattice of the levels above and below. This arrange- 
ment is extraordinarily similar to that seen in the collagen VI ocular 
aggregates described above. 


III. INTERTWINING OF NETWORK COLLAGENS: 
THE Type VI SEGMENTED SUPERCOIL 


As discussed above, the fact that Type VI collagen contains more than 
one chain type makes it possible for the triple-helical regions of the three 
chains in this collagen to aggregate systematically to form a segmented 
supercoil (Knupp and Squire, 2001). Since it is possible that other network 
collagens can adopt a similar fold (e.g., along parts of Type IV collagen), 
the suggested origins of this supercoiling are briefly outlined here. 

Figure 7 shows the basis of the analysis by Knupp and Squire (2001). 
Study of the properties of the central 75 nm domain was carried by 
separately analyzing the contributions of the positively and negatively 
charged residues, the hydrophobic residues, and also the numerous 
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Fic. 7. Sequence character along the Type VI collagen overlap region and 
illustration of the segmented-coil model of dimer formation (Knupp and Squire, 
2001). (A) Fourier syntheses of the polar, hydrophobic, and proline residue 
distributions in the 75 nm long region of the collagen VI dimer using the principal 
peaks in the Fourier transform of those particular amino acids in the sequence. Code: 
blue, the charge distribution along a molecule; red, the same charge distribution 
reversed as for the second (antiparallel) molecule; white, the proline distribution of 
each molecule; yellow, the hydrophobic distribution of each molecule. The proline and 
hydrophobic distributions are symmetrical so the illustrated distributions apply to both 
members of the antiparallel pair. The distribution of polar residues is asymmetric and 
complementary across the midline so that areas of high positive charge density along 
one molecule (blue line) are in register with minima (areas of high negative charge 
density) of the other molecule (red line). Charge/proline distributions and 
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prolines in positions X and Y of each triplet. In all cases, a property score 
was either determined for each chain individually or was summed over the 
three chains for further analysis, the latter being termed the «123 mole- 
cule. The total amino acid array length was about 250 residues. It was 
found that the summed polar amino acid scores for the three a-chains 
gave a main peak in the Fourier transform at 46.5 amino acids, with a 
second weaker peak at about 36.5 residues. The summed proline scores 
for the three a-chains had a main repeat at 23.3 amino acids, and the 
summed hydrophobic amino acid profile repeated at 21.3 amino acids. 
The discontinuities in the tripeptide sequences (in other words, the 
absence of glycine at the expected position, or the insertion or deletion 
of an amino acid giving a phase change in the Gly-X-Y sequence) occurred 
at approximate intervals of 44-48 amino acids apart from one missing 


hydrophobic distributions are in phase towards the extremities of the 75 nm long 
region, but out of phase in the central part. Green arrows indicate the positions of the 
interruptions along the two portions of the triple helix. These generally occur in the 
middle of proline-rich regions. (B) Hydropathicity profiles in the 75 nm long region of 
the Type VI dimers. In the three panels, the curve underlying the light blue areas is the 
sum of the hydrophobic contribution of all three a-chains. In dark blue, red, and yellow 
are highlighted the individual contributions from the al, a2, and a3 chains, 
respectively. The hydrophobic contributions are not evenly spread along the three 
chains; the main contributions are from a3 towards the N-terminal end (left), from al 
towards the C-terminal end and from a2 in the middle, with slightly overlapping 
contributions here from al and a3. These regions are highlighted by yellow, blue, and 
red background shading respectively. Black arrows highlight the 21-22 amino acid 
repeat of the total hydrophobic (light blue) distribution. (C) Radial projection of the 
collagen triple helix in the 75 nm long region of Type VI collagen that contributes to 
dimerization. This radial projection can be thought of as being produced by wrapping a 
piece of paper into a cylinder around the collagen triple helix, and then tracing the 
positions of the single a-chains onto it; the al, a2, and a3 chains are shown in blue, 
red, and yellow, respectively. The black circles represent the locations of the main 
hydrophobic regions on the triple helix (their repeat is indicated by the black arrows). 
These amino acids lie on a track corresponding to a left-handed superhelix of pitch 
37.5 nm. (D) Two antiparallel molecules (one black and the other green) can only 
bring together the hydrophobic patches along the corresponding two superhelices if 
the molecules twist around each other. (E) Diagram of the 75 nm long region 
contributing to dimerization on two antiparallel molecules. Red and blue regions show 
bands of net +ve or —ve charge; gray patches show regions of high proline density. 
Green arrows highlight regions where the Gly-X-Y sequence is interrupted; yellow 
arrows show the cystines. The positions of the main hydrophobic patches are shown by 
ovals around the triple helix (black in front, gray behind). (F) Model of the collagen VI 
segmented-supercoil showing the complementary polar interaction of the molecular 
bands illustrated in (E) using the same colors. The hydrophobic patches in (E) are now 
buried on the central line of contact of the two molecules (Bar, 50 amino acids or 
15 nm, applies to all parts of the Figure). 
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discontinuity. Clearly, most of these repeats (polar 46.5, proline 23.3, 
disruptions 44-48) are closely related; only that of the hydrophobic 
residues (21.3) does not fit in. 

Fourier syntheses of the polar, hydrophobic, and proline residue dis- 
tributions in the 75-nm long region of the collagen VI dimer obtained 
using the principal peaks in the Fourier transform of those particular 
amino acids in the sequence are illustrated in Fig. 7. Taking the first 
two 23.3 amino acid proline repeats at the N-terminus as a starting point 
(Fig. 7A), the negative charges dominate in the first of these repeats 
(proximal to the N-terminus) and the positive charges dominate in the 
second. This pattern then repeats along the molecule. The sequence 
discontinuities occur regularly in the middle of every second proline 
repeat. The hydrophobic repeat is such that it starts at the N-terminal 
end (left hand edge of Fig. 7A) more or less in phase with the proline 
repeat, but then the hydrophobics get gradually out of step with the 
prolines, so that they are in antiphase halfway along the molecule and 
are back exactly in phase at the right hand (C-terminal) end. In summary, 
the hydrophobic pattern and the proline pattern have approximate mirror 
symmetry around the center of this sequence, whereas the distribution of 
charges is asymmetric and complementary. 

Taking the amino acid distribution in one molecule and trying to match 
against it the appropriate features of the adjacent molecule in the pair 
gave rise to the following logic. First, if the interacting molecules are 
parallel, they must be axially shifted to produce complementary charge 
interactions, but then hydrophobic residues with a different repeat will 
inevitably be out of step in the two molecules and the cystine residues will 
also not line up. On the other hand, if the interacting molecules are 
antiparallel, then to produce complementary charge interactions they 
could either be overlapped by 75 nm as in Fig. 7D and E, or they could 
be stepped in either direction by a small multiple of the 48 polar amino- 
acid repeat. However, if the molecules are not overlapped by 75 nm, then 
the hydrophobic patches are no longer in optimal register. In fact, the 
only structure that brings opposite charges into alignment and also at the 
same time optimizes apolar interactions is the 75 nm overlapped anti- 
parallel dimer. This also brings the cystine residues at the ends of the 
tripeptide regions of the two chains into axial alignment. 

Further analysis of the separate chains shows that the al chain contains 
a higher than average number of negative charges, the a3 chain contains a 
higher than average number of positive charges, and the a2 chain is close 
to neutral. When analyzing the hydrophobic residues in individual chains 
(Fig. 7B), it was found that the main contribution to the total molecu- 
lar hydropathicity towards the N-terminal end is from the a3 chain, at the 
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C-terminal end is from the «al chain, and in the middle is from a2, with 
slightly overlapping contributions from al and a3. If the amino acid 
characteristics had been evenly spread along all three chains, then there 
would have been no evidence for supercoiling. Assuming that the molec- 
ular interactions are stabilized primarily by apolar interactions, the fact 
that the sites of greatest hydropathicity follow roughly helical tracks along 
the molecules (Fig. 7C-E), and that they shift from one chain to another, 
suggests that supercoiling must occur. The «123 molecule shown in Fig. 7F 
would give a supercoil of pitch about 37.5 nm (one half of 75 nm). 

Since the sequence discontinuities occur precisely in step with the 
proline and polar amino acid repeats, one can imagine a pseudo-repeating 
unit comprising: [break-short proline region-negative patch-long proline 
region-positive patch-short proline region-break]. However, the fact that 
the hydrophobic repeat does not coincide with this main structural 
sequence means that, in terms of their apolar character, every one of 
these repeats is different. The presence of the repeats separated by 
discontinuities gives the supercoil a clearly segmented character, hence 
the preference for the name segmented supercoil for this collagen structure. 
This segmentation is also emphasized by the fact that regions of high 
proline content are likely to be rather rigid, whereas the polar-rich regions 
are likely to be more flexible. The curvature along the molecules would 
not be expected to be constant, but to vary slightly within each segment. 

Finally, note that, as in all other supercoiled molecules (e.g., the 
a-helical coiled coil), the sense of the twist changes from the Type VI 
molecular supercoil (left-handed) to the supercoiling of the individual 
collagen chains in the 10/3 helix (right-handed) to the twist within each 
chain (left-handed). This is characteristic of ‘‘regular lay’’ rope structures 
and produces strength and rigidity in the assembly. 


IV. COMMON STRUCTURAL THEMES IN 
NETWORK-FORMING COLLAGENS 


The different members of the network-forming collagen family vary 
considerably with respect to their amino acid sequence and dimensions. 
Nevertheless, it is possible to recognize common structural patterns in the 
way they assemble. Figure 8 summarizes various structural characteristics 
of collagen IV, VI, and dogfish egg case collagen. These three kinds of 
collagen are all capable of linearly associating via their globular domains 
(Fig. 8A—C, and axially through their triple-helical chains (Fig. 8D-F). 

Collagen IV associates linearly, end-to-end, via its C-termini (C-terminus 
with C-terminus), while collagen VI and dogfish collagen appear to form 
this kind of association via a heterotypic aggregation of their globular 
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Fic. 8. Summary of the structural characteristics of collagen IV, VI, and dogfish egg 
case collagen. These collagens can interact linearly end-to-end through their N- and 
C-termini (represented in green and purple, respectively). Dogfish egg case and 
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domains (C-terminus with N-terminus). Further, the three types of colla- 
gen associate laterally by antiparallel interactions. Type IV collagen is also 
seen to associate via its 30-nm-long N-terminals (Fig. 8C) and, in addition, 
by intertwining of two or more collagen molecules (Fig. 8F). Collagen VI 
molecules associate laterally to form antiparallel dimers and tetramers 
(Fig. 8E). Collagen VI monomers are axially shifted by about 30 nm when 
forming dimers and, as discussed in Section III, they also intertwine 
around each other, in this case forming a segmented supercoil (Fig. 8E). 

Finally, the dogfish egg case collagen associates axially into octamers. As 
with collagen VI, this association involves an axial displacement of four 
monomers by about 15 nm with respect to the other four (Fig. 8D). 
Although at the resolution achieved in the structural studies of the egg 
case carried out so far, it was not possible to distinguish any molecular 
supercoiling, it cannot be excluded that some kind of molecular inter- 
twining is formed also in this case, since the resulting structure would 
presumably possess a functionally desirable increase in strength. 

Note thataclearimplication ofthe analysis ofthe segmented supercoil struc- 
ture is that it will not occur if the three chains in the molecule are identical 
(e.g., Type X collagen). It may well be possible to generate an analogous 
supercoil with two chain types (e.g., the Type IV versions: [a1 (IV) ]o[a2(IV) ] 
or [a5 (IV) ]o[a6(IV)], and Type VIII version: [a1 (VII) Je[a2 (VIII) ]) or in 
parts of the three chain version of Type IV [a3(IV) ][a4(IV)][a5(IV)], but 
this has yet to be demonstrated. 

The way collagens VIII and X assemble has not been studied in detail 
yet, but it appears that they too form molecular assemblies via globular 
interactions that involve more than two molecules. In particular, because 
the Type X collagen hexagonal arrangement does not account for the 
dimensions of the molecules (they have a smaller periodicity than ex- 
pected), it was suggested that there might be axial overlapping of mole- 
cules. Dogfish egg case collagen forms a hexagonal structure as well, and 


collagen VI interactions are represented in (A), and collagen IV interactions in (B) and 
(C). In addition, they all interact laterally to form antiparallel dimers. In (D), the 
dogfish egg collagen interactions are represented. At present, it is not known whether 
monomers associate with a 15 nm axial shift to form dimers, or if an axial shift occurs 
between already formed tetramers. Although the resolution reached does not allow a 
direct visualization, it is conceivable that supercoiling occurs between monomers. 
Lateral interactions of collagen VI are represented in (E) and those of collagen IV in 
(F). Networks are formed when linear and lateral interactions take place: the dogfish 
egg case collagen network is represented in (G), the collagen VI network in (H), and 
the collagen IV network in (I). Collagen VI is also capable of forming long chains when 
tetramers associate linearly (H, top). 
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it is likely that hexagonal symmetry is a consequence of the intrinsic 
threefold character of the collagen molecules due to their triple-helical 
conformation. 

Linear and axial interactions of the kind described above appear funda- 
mental in the formation of collagen networks. The interactions formed by 
C- and N-terminals help to form linear successions of molecules, while the 
axial interactions are important in the generation of open networks since 
they involve molecules belonging to different levels. In addition, inter- 
twining of different collagen molecules may be another common theme. 
This behavior is not present in other collagens (e.g., collagen I). Inter- 
ruptions in the Gly-X-Y sequence of the network-forming collagen is a 
common characteristic. This may help to reduce the stiffness of the 
collagen molecule, allowing more spatial freedom for the molecule, and 
may also promote supercoiling. Finally, collagen IV and VI seem to be able 
to interact with many other proteins and macromolecules. It is probable 
that this characteristic is used in strengthening the networks formed by 
these collagens, as well as having other physiological implications. 
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ABSTRACT 


Fibrillin microfibrils are widely distributed extracellular matrix assemblies 
that endow elastic and nonelastic connective tissues with long-range elasticity. 
They direct tropoelastin deposition during elastic fibrillogenesis and form an 
outer mantle for mature elastic fibers. Microfibril arrays are also abundant in 
dynamic tissues that do not express elastin, such as the ciliary zonules of the 
eye. Mutations in fibrillin-1—the principal structural component of micro- 
fibrils—cause Marfan syndrome, a heritable disease with severe aortic, ocular, 
and skeletal defects. Isolated fibrillin-rich microfibrils have a complex 56 nm 
“beads-on-a-string’”’ appearance; the molecular basis of their assembly and 


ADVANCES IN 405 Copyright 2005, Elsevier Inc. 
PROTEIN CHEMISTRY, Vol. 70 All rights reserved. 


DOI: 10.1016/S0065-3233 (04) 70012-1 0065-3233/05 $35.00 


406 KIELTY ET AL. 


elastic properties, and their role in higher-order elastic fiber formation, 
remain incompletely understood. 


I. INTRODUCTION 


Fibrillin microfibrils are widely distributed elastic microfibrils that en- 
dow connective tissues with long-range elasticity (Handford et al., 2000; 
Kielty et al., 2002a). They are evolutionarily highly conserved from jellyfish 
to man, which confirms their critical biomechanical importance. Micro- 
fibrils are particularly abundant in elastic tissues such as aorta, lung, and 
skin, where they direct tropoelastin deposition during elastic fibrillogen- 
esis and form an outer mantle for mature elastic fibers (Kielty et al., 2002b; 
Mecham and Davis, 1994). Microfibril arrays are also abundant in dynamic 
tissues that do not express elastin, such as the ciliary zonules of the eye that 
hold the lens in dynamic suspension (Ashworth et al., 2000). Structural 
analysis of isolated fibrillin-rich microfibrils has revealed a complex 56 nm 
““beads-on-a-string’’ appearance (Fig. 1A), while the length of a fibrillin 
monomer is ~160 nm (Lin et al., 2002; Sherratt et al., 2001). Mutations in 
fibrillin-1 cause Marfan syndrome (MEFS), a heritable disease associated 
with severe aortic, ocular, and skeletal defects due to defective elastic 
fibers, as well as several related fibrillinopathies (Robinson and Booms, 
2001). 

In man, although there are three fibrillin genes, fibrillin-1 is the major 
structural component of microfibrils. Fibrillin molecules are large 
(~340 kDa) and complex multidomain glycoproteins. Fibrillin-] and fi- 
brillin-2 have distinct but overlapping patterns of expression (Zhang et al., 
1995). Fibrillin-2 is generally expressed earlier in development than fibril- 
lin-] and may be particularly important in elastic fiber formation (Zhang 
et al., 1994). Fibrillin-3 was isolated from brain, and it is not known 
whether it forms microfibrils (Corson et al., 2004; Nagase et al., 2001). In 
this Chapter, we delineate the structural organization of fibrillin-] mole- 
cules and microfibrils, and evaluate current models of fibrillin-1 alignment 
within microfibrils. 


II. FIBRILLIN MOLECULES 


The three closely related fibrillin isoforms have distinct, but overlap- 
ping, developmental and adult tissue distributions. Human fibrillin-1 
consists of 2871 amino acids, with a calculated pI of 4.81 and a predicted 
molecular mass of 347 kDa (Pereira et al., 1993) (Fig. 1B). There are 14 
N-linked glycosylation consensus sequences that, in recombinant 
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Fic. 1. (A) Ultrastructural appearance of isolated fetal bovine aorta fibrillin 
microfibrils. Using intermittent contact mode atomic force microscopy beads are 
identified as repeating height maxima (white) with an axial periodicity of 56 nm. (Bar, 
500 nm.) (B) Domain structure of N- and C-termini processed human fibrillin-1. 
Consensus N-linked glycosylation sequences and the potential N-terminal MAGP-1 
binding site are depicted by blue circles and a cyan diamond respectively. Domains are 
colored according to experimentally determined antibody binding positions. (C) 
Localization of fibrillin antibody epitopes by Western blotting of recombinant protein 
fragments. 


fragments, are occupied (Bax et al., 2003; Rock et al., 2004). The signal 
peptide is located at the extreme N-terminus and consists of 27 amino 
acids. Fibrillin-1 contains 47 epidermal growth factor-(EGF-) like domains: 
43 are calcium-binding (cbEGF)-like domains, seven are 8-cysteine (TB) 
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modules, two are hybrid motifs with similarities to both cbEGF-like 
domains and TB motifs, and there is a proline-rich region that may act 
as a hinge region. In fibrillin-2, this sequence is glycine-rich (Zhang et al., 
1994); in fibrillin-3, it is proline- and glycine-rich (Corson et al., 2004). 
N- and C-terminal furin cleavage sites imply key roles for pericellular 
processing in fibrillin-] assembly. There is extreme evolutionary con- 
servation of fibrillin-] among mammalian species (human, porcine, 
bovine, murine). There is also remarkable nucleotide identity within the 
5’ flanking sequence, suggesting a regulatory role for this region. 


A. cbEGF-Like Domains 


EGF-like domains are a commonly identified protein module in multi- 
domain proteins (Downing et al., 1996; Smallridge et al., 2003). They 
contain six cysteine residues that form intradomain disulphide bonds in 
a 1-3, 2-4, and 5-6 manner. A subset of these domains contains a calcium 
binding consensus sequence of (D/N) X(D/N) (E/Q) Xn (D/N)* Xa (Y/F), 
where m and n are variable, and the asterisk indicates a potential 
ö-hydroxylation site (here designated cbEGF-like domains). cbEGF-like 
domains often exist as tandem repeats, and calcium binding is critical 
for their structural integrity and function (Downing et al., 1996; Werner 
et al., 2000). More than 60% of the mutations causing Marfan syndrome 
identified to date occur within fibrillin-1 cbEGF-like domains, empha- 
sizing that correct folding of these domains is critical for molecular 
function. Calcium plays a general role in maintaining structural rigidity 
of the interdomain linkage, and stabilizes cbEGF-like domain pair struc- 
tures (Downing et al., 1996). Tandem repeats of fibrillin-] cbEGF-like 
domains are predicted to form rod-shaped structures in the presence of 
calcium. 

High-resolution structures for several fibrillin-1 cbEGF-like domains 
have been determined by nuclear magnetic resonance (NMR) and X-ray 
crystallography (Downing et al., 1996; Lee et al., 2004; Smallridge et al, 
2003). In solution, covalently linked cbEGF-like domains (cbEGF32-33) 
were orientated in a near-linear extended arrangement, which was stabi- 
lized by calcium ligation of the C-terminal domain and interdomain 
hydrophobic packing interactions. The NMR solution structure of anoth- 
er calcium-loaded fibrillin-] cbEGF-like domain pair was also solved. 
cbEGF12-13 also exhibits a near-linear, rodlike arrangement of domains, 
which suggests that all fibrillin-1 cbEGF-cbEGF pairs characterized by a 
single interdomain linker residue possess this rodlike structure. The do- 
main arrangement of cbEGF12-13 was stabilized by additional interdomain 
packing interactions to those observed for cbEGF32-33. The cbEGF12-13 
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domain pair is within the longest run of cbEGFs, and many mutations that 
cluster in this region are associated with severe, neonatal MFS (Robinson 
and Booms, 2001). There is a correlation between backbone flexibility 
and calcium binding affinity. For both tandem cbEGF domain pairs, the 
N-terminal domain binds calcium more weakly than the C-terminal do- 
main. However, the affinities of the N- and C-terminal domains of 
cbEGF12-13 for calcium are significantly higher than those observed for 
cbEGF32-33, so there is also variation between domain pairs in terms of 
calcium affinities in relation to flexibility. Mutations in cbEGF-like do- 
mains that affect cysteine residues are likely to alter disulfide bond forma- 
tion, thereby disrupting the correct fold, while mutations affecting 
residues in the calcium binding consensus sequence reduce calcium 
binding affinity, leading to structural destabilization. Other mutations 
could impair domain folding. Missense mutations that change calcium 
binding ligands cause increased proteolytic susceptibility of recombinant 
fibrillin fragments (Booms et al., 2000; Reinhardt et al., 2000a; Whiteman 
and Handford, 2003). 


B. TB Motifs 


Fibrillin TB motifs (also known as eight cysteine motifs) are so-called 
because of their homology to motifs found in the latent TGFG binding 
proteins (LTBPs), some of which can covalently bind the cytokine TGFP. 
TB motifs are found only in the fibrillin-LTBP superfamily of proteins and 
contain eight cysteine residues, including a contiguous internal cluster of 
three cysteine residues within an a-helical region. The eight cysteine 
residues disulphide bond in a 1-3, 2-6, 4-7, 5-8 pattern. In fibrillin-1, 
six of these motifs are interspersed among the cbEGF-like domains, and a 
seventh precedes the proline-rich region. The TB motif is a potential 
source of flexibility within fibrillin-1 through its interactions with flanking 
cbEGF-like domains. The structure of the fibrillin-1TB6 motif itself was 
established by NMR (Yuan et al., 1997). It is globular and comprises two 
a-helices and six (-strands. Strands A, D, E, and F form a four-stranded 
G-sheet, while the B and C strands form a two-stranded (-hairpin. 

The NMR study of fibrillin-1 TB6-cbEGF32 also identified a flexible 
linker between these domains (Yuan et al., 1997). The crystal structure 
of a cbEGF22-TB4-cbEGF23 fragment from fibrillin-1, which contains an 
Arg-Gly-Asp (RGD) integrin recognition site, has also been determined in 
Apo (one-calcium bound) and Holo (two-calcium bound) forms (Lee et al., 
2004). This study revealed novel pairwise interactions mediated by TB4, 
one of which may contribute to fibrillin-1 elasticity (see Section V below). 
Fibrillin-1 TB4 is recognized by integrin receptors a5ßl and ol in an 
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RGD-dependent manner (Bax et al., 2003). The structure of the third TB 
module in LTBP-3 (TB3,rgrı) has also been solved (Lack et al., 2003). 
Unlike fibrillin TB motifs, TB3;-7gp; contains a Phe-Pro insertion that 
renders the 2,6 disulphide bond more easily accessible for protein—protein 
interaction with the TGF(1 propeptide. The 2,6 disulphide bond can form 
a complex with the TGF/51 propeptide (LAPß1) and it is not required 
for domain folding. Other unique features revealed from the structure 
of TB3Lrppı are two hydrophobic patches that may be involved in intra- 
or intermolecular protein-protein interactions, and five negatively 
charged residues surrounding the 2,6 disulphide bond that result in a 
large electrostatic surface potential. 


C. Hybrid Motifs 
Hybrid motifs are apparently unique to fibrillins and LTBPs. They 
contain primary structural features of both TB motifs towards their 
N-termini and cbEGF-like domains towards their C-termini. No high- 
resolution structures have yet been determined for these motifs. Most 
hybrid motifs contain eight cysteines, but the first hybrid motif of fibril- 
lin-1 contains nine cysteines; one of these cysteines is predicted to be 


important in forming intermolecular disulphide bonds that stabilize 
assembled fibrillin-1 (see Section HI below; Reinhardt et al., 2000b). 


D. Proline/Glycine-Rich Regions 


Fibrillin-1 contains a proline-rich region (42% proline content), in 
contrast to fibrillin-2 (which contains a glycine-rich region) and fibrillin- 
3 (which has both a proline- and glycine-rich region). While there is 
limited sequence homology between these regions, all have the potential 
to fold, which may have been an evolutionary requirement for this region. 
No Marfan-causing mutations have been identified within the proline-rich 
region of fibrillin-1, which implies a critical role. 


E. N- and C-Termini 


The N-terminal region of fibrillin-1 (the first 29 residues after signal 
peptide) contains largely basic residues (estimated pl, 11.1). There is an 
N-terminal putative furin cleavage site at the N-terminus (RAKR|R; resi- 
dues 41-45). Recombinant secreted N-terminal fibrillin-1 can be furin 
processed (Raghunath et al., 1999; Ritty et al., 1999; Wallis et al., 2003). 
In our mammalian recombinant expression system, a significant level of 
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secreted N-terminal fragments is not cleaved at this site, although this 
could reflect high expression. Our mass spectrometry studies of isolated 
microfibrils have failed to reveal any tryptic peptides from fibrillin-1 
sequences upstream of the N-terminal furin cleavage site (Cain et al., 
2004). 

The C-terminal sequence after the last cbEGF-like domain comprises 
184 residues and contains a major furin cleavage site (RKRR|; residues 
2728-2732). The unique sequence prior to the cleavage site contains a 
CxxC sequence that could have protein disulphide isomerase (PDI) 
activity, and thus the ability to (re)arrange disulphide bonds. The pro- 
cessed C-terminal unique sequence contains three N-linked glycosylation 
sites just after the furin processing site, which can regulate furin proces- 
sing (Ashworth et al., 1999a). On pericellular furin cleavage, a 20 kDa 
N-glycosylated fragment is released. Furin processing appears to be essen- 
tial for microfibril deposition (Raghunath et al., 1999). However, using 
mass spectrometry analysis, we have shown that tissue-isolated microfibrils 
from zonules, skin, and aorta all contain at least some C-terminal frag- 
ments, as judged by detection of tryptic peptides from the post-furin 
cleavage site sequence (Cain et al., 2005). Thus, the significance of 
C-terminal processing in assembly will need to be reevaluated. 


F. Molecular Organization 


Low-resolution imaging of fibrillin-1 molecules and molecular frag- 
ments has been achieved by rotary shadowing (Lin et al., 2002; Reinhardt 
et ol, 1996a). By this approach, fibrillin-] appears as relatively flexible 
rodlike molecules of ~160 nm with evidence for several small globular 
domains. In the presence of calcium, fibrillin-1 molecules appear shorter 
with a wider diameter, and more flexible (Reinhardt et al., 1997). We have 
used atomic force microscopy to image unshadowed fibrillin microfibrils 
(Fig. 1A; Kelte et al., 2002a; Sherratt et al., 2004), as well as recombinant 
fibrillin-1 fragments (unpublished data). We are currently completing the 
solution structure of fibrillin-] using small angle X-ray scattering. 

The amino acid sequence of fibrillin-1 is predominantly hydrophilic, 
even in the presence of bound calcium (Sherratt et al., 2004). In addi- 
tion, fibrillin-1 has 14 potential N-glycosylation sites that may contribute 
to the hydrophilic nature of fibrillin-1, since complex N-linked oligosac- 
charides are negatively charged if they contain sialic acid. Additional 
regions of increased hydrophilicity may be contributed by other microfibril 
components (see Section III below). 
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III. FIBRILLIN ASSEMBLY 


Fibrillin-1 microfibrils are highly complex extracellular polymers, and 
the molecular basis of assembly is not well understood. Studies have 
focused on defining the role of cells in regulating assembly, the molecular 
interactions that drive assembly, and the role of microfibril-associated 
molecules in the assembly process. 


A. Role of Cells 


Microfibril assembly is, at least in part, a cell-regulated process that 
proceeds independently of tropoelastin. Fibrillin-1 may possibly undergo 
limited initial assembly in the secretory pathway (Ashworth et al., 1999b; 
Trask et al., 1999), and in this respect is similar to other major extracellular 
matrix (ECM) macromolecules such as collagens, laminins, and proteogly- 
cans. Intracellular chaperone associations are likely to play key roles in 
molecular folding and N-glycosylation (Ashworth et al., 1999b; Wallis et al., 
2003), and in preventing inappropriate intracellular assembly. Cells also 
regulate the pericellular location of N- and C-terminal cleavage by furin 
(Raghunath et al., 1999; Ritty et al., 1999). 

Microscopy studies have shown that assembly occurs in association with 
cell surfaces and predict a key role for receptors in this process. A central 
role for receptors has also been shown for fibronectin assembly, in which 
dimer interactions with a5(1 integrins induce a conformation change 
required for linear assembly (Sechler et al., 2001). Since the RGD sequence 
in the fourth fibrillin-1 TB motif interacts with avß3 (Bax et al., 2003; Lee 
et al., 2004; Pfaff et al., 1996; Sakamoto et al., 1996) and a581 (Bax et al., 
2003), these receptors might influence microfibril assembly in a similar 
manner. It has also been shown that heparan sulphate proteoglycans 
(HSPGs), possibly in the form of cell surface syndecan receptors, may have 
a role in assembly (Tiedemann et al., 2001). Pericellular chondroitin sul- 
phate proteoglycans may be needed for beaded microfibril formation (see 
below). Sulphation is a requirement for microfibril assembly, since chlorate 
treatment ablates microfibril and elastic fiber formation (Robb et al., 1999). 
This effect may reflect the absence of a proteoglycan, or undersulphation 
of fibrillins or the microfibril-associated molecule MAGP-1 (see below). 


B. Fibrillin-1 Homotypic Interactions 


The molecular form of fibrillin-1 secreted from cells is unclear. As is the 
case for most major ECM polymers, fibrillins can undergo limited intra- 
cellular assembly to form dimers or trimers that could be intermediates 
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during extracellular assembly (Ashworth et al., 1999b; Lin et al., 2002; Trask 
et al., 1999). However, monomers that have been excluded from assembly 
are often detected in cell culture medium (Wallis et al., 2003; C. M. Kielty, 
unpublished data). Difficulties in expressing recombinant full-length 
fibrillin-1 have so far precluded detailed assessment of whether fibrillin-1 
is secreted predominantly as a monomer, and whether it can self assemble. 
Several models of fibrillin-1 alignment within assembled microfibrils have 
been proposed (see Section V below), and all support the concept that 
fibrillin-1 can assemble linearly in a head-to-tail fashion at the cell surface. 
Microfibril assembly must also involve controlled lateral fibrillin-1 packing. 

We initially investigated fibrillin-1 C-terminal processing and assembly 
using an in vitro translation system supplemented with semipermeabilized 
cells as a source of endoplasmic reticulum (Ashworth et al, 1999a). 
Processing of C-terminal fibrillin-1 was shown to be influenced by 
N-glycosylation immediately downstream of the furin site, and by associa- 
tion with calreticulin. Interestingly, the C-terminus of fibrillin-2 underwent 
less efficient processing than C-terminal fibrillin-1 under identical condi- 
tions. Differences in processing rates of these two fibrillin isoforms may 
reflect differential abilities to assemble into microfibrils. Size fractionation 
of the N-terminal regions of fibrillin-1 (encoded by exons 1-11 and 1-15), 
and of unprocessed and furin-processed C-terminal fragments of fibrillin-1 
(encoded by exons 50-65), revealed that the N-terminus formed abun- 
dant disulphide-bonded aggregates. Some association of unprocessed 
C-terminal fibrillin-] was also apparent, but processed C-terminal se- 
quences remained monomeric unless N-terminal sequences encoded by 
exons 12-15 were present. These data indicated the presence of fibrillin-1 
molecular recognition sequences within the N-terminus and the ex- 
treme C-terminal sequence downstream of the furin site, and a specific 
N- and C-terminal association that could drive linear accretion of 
furin-processed fibrillin-1 molecules in the extracellular space. 

Other recombinant studies showed that recombinant N-terminal and 
other regions of fibrillin-1 have a tendency to dimerise (Ashworth et al., 
1999b; Trask et al., 1999). An unpaired cysteine residue in the first hybrid 
domain, which contains nine cysteines, is predicted to be involved in 
covalently linking N-terminally-aligned fibrillin-] molecules (Reinhardt 
et al, 2000b). N- and C-terminal halves of the fibrillin-1 molecule, and 
the N-terminal half of fibrillin-2, were shown to interact with high affinity 
in solid-phase and BIAcore binding assays (Tiedemann et al., 2001). 
However, N- and C-terminal fibrillin-2 polypeptides did not interact with 
each other. These data demonstrated that fibrillins can directly interact in 
an N- to C-terminal fashion to form homotypic fibrillin-] or heterotypic 
fibrillin-1/fibrillin-2 microfibrils. 
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We have recently undertaken a detailed binding and kinetic analysis of 
fibrillin-1 homotypic interactions, using smaller recombinant fragments 
that span the entire coding region of human fibrillin-1, in order to define 
more precisely the fibrillin-1 sequences that interact (Marson et al., 2005). 
The N-terminal sequence encoded by exons 1-8 binds with low nM affinity 
to itself, and also bound a downstream sequence encoded by exons 9-17. 
The furin-processed C-terminus and the proteolytically released C-terminal 
20 kDa fragment both bind homotypically and tightly to the N-terminal 
fragment. No other homotypic interactions within fibrillin-1 were detected. 
MAGP-1, which also binds the N-terminus (see below), was found to inhibit 
the N- to C-terminal interaction, but not the N- to N-terminal interaction. 

In summary, the N- to C-terminal interactions may be a basis for linear 
fibrillin-1 assembly, while the N- and C-terminal homotypic interactions 
may be critical in lateral assembly. MAGP-1 may playa role in regulating N- to 
C-terminal linear fibrillin-1 assembly. 


C. Fibrillin-1 Interactions with Microfibril-Associated Molecules 


An extensive inventory of microfibril-associated molecules has been 
identified on the basis of gene mapping of heritable elastic fiber defects, 
mouse models, and detailed immunohistochemical and biochemical stud- 
ies of elastic tissues (Kielty et al., 2002a). These molecules can be categor- 
ized as those that colocalize or copurify with microfibrils, those that occur 
at the elastin-microfibril interface or the elastic-fiber-cell interface, and 
those that are involved in the process of elastic fiber formation. In vitro 
binding studies have confirmed that fibrillin-1 can interact with a number 
of matrix molecules. They include MAGPs-1 and -2, fibulins-1, -2, and A 
versican, and heparan sulphate. It is now clear that the fibrillin-1 
N-terminus is highly interactive. It not only binds itself and the C-terminus, 
but it can also interact strongly with several of these associated molecules. 
It is unclear whether these interactions are mutually exclusive. 

MAGP-1 (sometimes known as MFAP-2) is possibly the best candidate 
for an integral microfibril molecule (Gibson et al., 1989; Trask et al., 2000). 
It is associated with virtually all microfibrils and widely expressed in 
mesenchymal and connective tissue cells throughout development. 
Human MAGP-I is a 183-residue molecule that has two distinctive do- 
mains: an acidic N-terminal half that is enriched in proline residues and 
has a clustering of glutamine residues, and a C-terminal portion that 
contains 13 cysteine residues and has a net positive charge. MAGP-1 has 
a matrix-binding domain (MBD) that targets this molecule to the ECM 
(Segade et al., 2002). MAGP-1 showed calcium-dependent binding to the 
fibrillin-1 N-terminus (Jensen et al., 2001; Rock et al., 2004). It localizes to 
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microfibril beads (often two per bead) (Henderson et al., 1996; Kielty and 
Shuttleworth, 1997) and is probably disulphide bonded to microfibrils 
since reduction is required for its extraction. MAGP-1 and fibrillin-1 are 
also both substrates for transglutaminase (Brown-Augsberger et al., 1996; 
Qian and Glanville, 1997; Rock et al., 2004). MAGP-2, the only other 
member of this protein family, is a 170-173 residue protein structurally 
related to MAGP-1, mainly in a central region (Gibson et al., 1998; Segade 
et al., 2002). MAGP-2 is rich in serine and threonine residues and contains 
an RGD cell-recognition motif through which it binds to the avp3 integrin 
(Gibson et al., 1999). MAGP-2 localizes to elastin-associated and elastin- 
free microfibrils in a number of tissues (Gibson et al., 1998; Hanssen et al., 
2004). Its restricted patterns of tissue localization and developmental 
expression suggest that MAGP-2 is not a primary structural component 
of microfibrils, but it may be important in cell signaling during microfibril 
assembly and elastic fibrillogenesis. 

Fibrillin-1 interacts with tropoelastin with high affinity, an interaction 
that is likely to be critical in elastic fiber formation. Using binding assays, 
we have demonstrated high-affinity calcium-independent binding of two 
overlapping fibrillin-1 fragments (encoded by central exons 18-25 and 
24-30) to tropoelastin, which, in microfibrils, maps to an exposed “‘arms’’ 
feature adjacent to the beads (Rock et al., 2004). A further binding site 
within an adjacent fragment (encoded by exons 9-17) was within an eight- 
cysteine motif designated TB2 (encoded by exons 16 and 17). A novel 
transglutaminase crosslink between tropoelastin and fibrillin-1 fragment 
(encoded by exons 9-17) was localized by mass spectrometry to a se- 
quence encoded by exon 17. The high-affinity binding and crosslinking 
of tropoelastin to a central fibrillin-] sequence confirm that this associa- 
tion is fundamental to elastic fiber formation. N-terminally bound MAGP-1 
may contribute not only to microfibril assembly, but also to tropoelastin 
deposition on microfibrils. 

Latent TGF/-binding proteins (LTBPs) comprise repeating cbEGF do- 
mains interspersed with TB modules, and are thus members of the 
“fibrillin superfamily’ (Oklu and Hesketh, 2000; Sinha et al., 1998). 
Specific TB modules in LTBP1, LTBP3, and LTBP4 can bind to TGFG 
intracellularly, forming a large latent complex that is secreted and cross- 
linked in the ECM by transglutaminase. Subsequent proteolytic release of 
the complex is necessary for TGF activation, so LTBPs play an important 
role in tissue targeting of TGFG. LTBP-1 colocalizes with microfibrils in 
skin and cell layers of cultured osteoblasts and in embryonic long bone, 
but not cartilage (Dallas et al., 2000; Raghunath et al., 1998; Taipale et al., 
1996). Thus, LTBP-1 is unlikely to be an integral structural compo- 
nent of microfibrils. LTBP-2 colocalizes with fibrillin microfibrils in elastic 
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fiber-rich tissues (especially in response to arterial injury) and in trabecu- 
lar bone (Gibson et al., 1995; Kitahama et al., 2000; Sinha et al., 2002). Itis a 
good candidate for an integral microfibril molecule, although it does not 
bind to TGFG. The C-terminus of LTBP-3 was reported not to bind 
fibrillin-1 (Isogai et al., 2003), but it is not known whether LTBP-4 can 
bind fibrillin-1 (Giltay et al., 1997; Saharinen et al., 1998). 

Several proteoglycans are associated with microfibrils and contribute 
critically to their assembly and integration into the surrounding ECM. 
Heparin and heparan sulphate bind fibrillin-1 in at least three sites, and 
supplementing culture medium with heparin inhibits microfibril assembly 
(Ritty et al., 2003; Tiedemann et al., 2001). Versican, a large chondroitin 
sulphate proteoglycan of the lectican family, was immunolocalized to 
microfibrils in skin (Zimmermann et al., 1994). The versican C-terminal 
lectin domain binds N-terminal fibrillin-1 sequences (Isogai et al., 2002). 
However, its nonperiodic association with microfibrils indicates that versi- 
can is probably not an integral structural component. Instead, it may 
associate with microfibrils, and its negatively charged chondroitin sulphate 
chains may influence integration of microfibrils into the surrounding ECM. 
Polycationic dyes showed an association between proteoglycan and elastic 
fibers (Baccarani-Contri et al., 1990). Two members of the small leucine- 
rich PG family, decorin and biglycan, were detected within elastic fibers in 
dermis; biglycan mapped to the elastin core and decorin mapped to 
microfibrils. Decorin can interact with both fibrillin-] and MAGP-1 individ- 
ually, and together they form a ternary complex (Trask et al., 2000). Small 
chondroitin sulphate proteoglycans may be associated with microfibrils and 
contribute to their beaded organization (Kielty et al., 1996). 

Fibulin-2 is not crosslinked within microfibrils, but strongly binds a 
fibrillin-1 N-terminal sequence (within residues 45-450; exons 2-10) in 
a calcium-dependent interaction (Reinhardt et al., 1996b). Fibulin-1 and 
fibulin-2 interact with the versican C-terminal lectin domain, and fibulin- 
2 also binds to aggrecan and brevican (Olin et al., 2000). Fibulin-5 is a 
critical determinant of elastic fiber formation (see below). It binds strongly 
to tropoelastin in a calcium-dependent manner although reportedly not 
to fibrillin-1, and colocalizes with tropoelastin (Nakamura et al., 2002; 
Yanagisawa et al., 2002). 

Microfibril-associated protein-1 (MFAP-1; also known as AMP), MFAP-3, 
and MFAP-4 (also known as MAGP-36) colocalize with microfibrils and 
elastic fibers in skin and other tissues (Abrams et al., 1995; Hirano et al., 
2002; Horrigan et al., 1992; Lausen et al., 1999; Liu et al., 1997; Toyoshima 
et al., 1999). In aging and immune conditions, microfibrils can associate 
with amyloid deposits and accumulate a coating of adhesive glycoproteins, 
such as vitronectin (Dahlback et al., 1990). 
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D. Microfibril Maturation 


There is some evidence for extracellular maturation of microfibrils, with 
the extracellular environment possibly playing a major role in regulating 
microfibril fate. In human dermal fibroblast cultures, monoclonal anti- 
body 11C1.3 (which binds to beaded microfibrils) does not detect micro- 
fibrils until about two weeks in culture, but a polyclonal antibody (PF2) to 
a fibrillin-1 pepsin fragment can detect abundant microfibrils within three 
days (Baldock et al., 2001). The time-dependent appearance of 11C1.3- 
reactive microfibrils suggests some maturation that might be due to 
conformational changes that unmask cryptic epitopes, transglutaminase 
crosslinking, or accretion of associated molecules. The relative tissue 
abundance of heparan sulphate and various microfibril-associated mole- 
cules may influence epitope availability and commit microfibrils to distinct 
extracellular fates. 


IV. MICROFIBRIL ORGANIZATION AND ELASTICITY 


A. Organization 


Early transmission electron microscopy (TEM) studies suggested that 
tissue microfibrils were tubelike structures with a diameter of 10-12 nm 
(Kielty and Shuttleworth, 1997). The most prominent feature of isolated 
microfibrils is their ‘‘beads-on-a-string’’ appearance with untensioned 
periodicity of ~56 nm (Keene et al., 1991; Kielty et al., 1991). This is seen 
after extraction using homogenization, bacterial or purified collagenase 
with or without hyaluronidase, and by rotary shadowing, negative staining, 
scanning transmission EM (STEM), atomic force microscopy (AFM), and 
cryoEM (Baldock et al., 2002; Kielty et al., 2002a,b). Interestingly, by quick- 
freeze deep etch microscopy, undisturbed zonular microfibrils appear to 
be more tubular, suggesting that molecular components are lost or that 
there is a major molecular rearrangement on extraction (Davis et al., 
2002). These structural differences between tissue and isolated microfi- 
brils are most prominent in the interbead region. 

STEM mass mapping of isolated microfibrils from a variety of devel- 
oping and adult tissues revealed repeating peaks and troughs of mass 
(Sherratt et al, 1997). There is abundant evidence for a shoulder of 
mass to one side of each bead, so microfibrils have an asymmetrical axial 
mass distribution. There is also evidence for asymmetric mass accretion 
during development. The axial mass distribution is statistically similar in 
all repeats, for each given microfibril population, implying that each 
repeat has comparable molecular composition and organization (Table I). 


TABLE I 


Fibrillin Microfibril Mass and Periodicity 


sIr 
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Mass per Bead center Interbead 
Periodicity repeat mass center mass 
(nm) (kDa) (kDa/nm) (kDa/nm) References 
Fetal* 
Bovine aorta 56.1 2065 62.4 18.9 Sherratt et al. (1997) 
Bovine nuchal ligament 56.1 2335 60.5 26.9 Sherratt et al. (1997) 
Adult* 
Canine ciliary zonule 58.9 2550 55.0 28.3 Baldock et al. (2001) 
Murine skin 55.1 2547 59.8 33.5 Kielty et al. (1998) 
Unpublished observations (MJS) 

Staggered’ 

(N-gly + 8 MAGP-1) 

One-third (domain 18) 54.0 923 19.7 14.6 Lee et al. (2004) 
One-third (domain 19) 54.0 923 25.2 13.4 Lee et al. (2004) 
One-third (domain 20) 54.0 923 22.1 13.9 Lee et al. (2004) 
One-half (domain 28) 81.0 1385 25.0 15.4 — 
Hinged‘ 

Monomer 54.0 2369 62.7 14.7 Baldock et al. (2001) 
Monomer (N-gly + 8 MAGP-1) 54.0 2772 66.9 18.0 — 


* Determined experimentally using STEM mass mapping. 
‘Determined from theoretical axial mass distributions at a one-third stagger and a one-half stagger (overlap indicated by domain 
number), and for a hinged model in the presence or absence of N-linked glycosylation (N-gly) and associated MAGP-1. 
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Calcium chelation and addition experiments have shown that bound 
calcium profoundly influences microfibril periodicity and organization 
(Cardy et al., 1998; Kielty and Shuttleworth, 1993; Wess et al., 1998a). In 
the absence of calcium, microfibril periodicity is reduced and microfibrils 
have a wider diameter and appear more flexible. We generated three- 
dimensional reconstructions of isolated microfibrils in the presence of 
calcium, based on automated electron tomography (Baldock et al., 2001). 
These data provided fine structural details and supported a fibrillin-1 
assembly model in which initial assemblies would undergo conformational 
maturation to a reversibly extensible beaded polymer (see Section V 
below), and also suggested a molecular explanation for microfibril exten- 
sibility. There was evidence for two arms emerging from one side of the 
beads, interbead twisting, and the beads appeared more heart-shaped than 
round. Antibody epitope mapping showed that the N- and C-termini were 
at either side of the bead, and that there is complex intramolecular 
folding in 56 nm microfibrils. Electron microscopy images of isolated 
microfibrils and detailed STEM mass mapping indicate that there are 
eight molecules in cross-section (Baldock et al., 2001; Wallace et al., 
1991). Dehydrated and hydrated isolated microfibrils all show the same 
major features. 

In tissues, microfibrils form loosely packed parallel bundles. X-ray fiber 
diffraction of hydrated zonular microfibril bundles identified one-third- 
staggered ‘‘junctions’’ that may modulate force transmission (Wess et al., 
1998a), and quick-freeze deep-etch analysis of zonules has identified 
molecular links between microfibrils (Davis et al., 2002). 


B. Elasticity 


Fibrillin-rich microfibrils have endowed tissues with elasticity through- 
out multicellular evolution. X-ray studies and mechanical testing of micro- 
fibril bundles showed that bound calcium influences load deformation, 
but is not necessary for high extensibility and elasticity (Eriksen et al., 2001; 
Wess et al., 1997, 1998a,b). Thus, tissue microfibril elasticity is modified by, 
but not dependent on, calcium-induced beaded periodic changes. This is 
consistent with the molecular folding model. 

We used molecular combing to determine Young’s modulus for indi- 
vidual microfibrils, and X-ray diffraction of zonular filaments of the eye 
to establish the linearity of microfibril periodic extension (Sherratt 
et al, 2003). Microfibril periodicity is not altered at physiological 
zonular tissue extensions, and Young’s modulus is between 78 to 96 MPa, 
MPa, which is two orders of magnitude stiffer than elastin. Thus, elasticity 
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in microfibril-containing tissues arises primarily from reversible alterations 
in supramicrofibrillar arrangements rather than from intrinsic elastic 
properties of individual microfibrils, which instead act as reinforcing fibers 
in fibrous composite tissues. 

What can be the molecular explanation for microfibril elasticity? Three- 
dimensional microfibril reconstructions allowed us to develop a model of 
fibrillin alignment in extensible microfibrils in which intramolecular folding 
would act as a molecular ‘‘engine,”’ driving extension and recoil (Baldock 
et al., 2001; Kielty et al, 2002c, 2004). Our model predicts that maturation 
from initial parallel head-to-tail alignment (~160 nm) to an approximately 
one-third stagger (~100 nm) occurs by folding at the termini and the 
proline-rich region, which would align known fibrillin-1 transglutaminase 
crosslink sequences. Microfibril elasticity (in the range of 56-100 nm) would 
arise from further intramolecular folding at flexible sites, which could be 
links between specific TB motifs and cbEGF-like domains. The crystal struc- 
ture of fibrillin-1 TB4 and flanking cbEGF-like domains has led to another 
suggested model for microfibril elasticity, based on a staggered arrangement 
for fibrillin-1 within microfibrils with molecular movement around certain 
domain linkages (Lee et al., 2004) (see Section V below and Table I). 


V. FIBRILLIN-1 ALIGNMENT MODELS 


There are two major current models of fibrillin-] alignment in micro- 
fibrils: the ‘‘molecular hinge”? model of unstaggered molecules and the 
one-third staggered arrangement. One driver in both models is the pres- 
ence of an intermolecular fibrillin-1 transglutaminase crosslink (Qian and 
Glanville, 1997). However, our recent mass spectrometry study has revealed 
that not every fibrillin-1 molecule is crosslinked within tissue microfibrils 
(Cain et al., 2005). 


A. Hinged Model 


Our model is based on detailed STEM mass mapping, automated 
electron tomography, and AFM data (Fig. 2; Baldock et al., 2001, 2002; 
Kielty et al., 2002c; Sherratt et al., 2003). It predicts maturation from an 
initial parallel head-to-tail alignment to an approximately one-third stag- 
ger (“100 nm) that would facilitate transglutaminase crosslink formation. 
The staggered form would further pack into a more energetically favorable 
~56 nm “untensioned’’ form. The model can account for known microfi- 
bril structural features and mass profiles (axial distributions, mass per 
repeat, mass contrast), is in agreement with the observed number of 
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molecules in cross-section, and provides a rationale for microfibril elastici- 
ty through molecular (but not domain) unfolding. It does not require the 
presence of additional molecules, but can accommodate associated mole- 
cules. For example, tropoelastin could bind at the shoulder in developing 
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D Microfibril Direction 


Fic. 2. Theoretical axial mass distributions (TAMD) calculated for the hinge model 
of fibrillin-1 microfibril structure (Baldock et al., 2001). The mass of each domain 
was determined from the primary amino acid sequence, with the addition of one 
Cal" (40 Da) to each calcium binding EGF domain (cbEGF). All domains were 
assumed to have an axial extent of 3 nm. (A) TAMD calculated for the primary amino 
acid sequence + Cal" of a single repeating unit (bead and interbead). (B) TAMD of 
a fully glycosylated single repeat (all putative N-linked glycosylation sites occupied 
with 2.46 kDa carbohydrate chains) and an N-terminal associated MAGP-1 (20.83 kDa). 
(C) Comparison of theoretical (primary sequence + Ca". 8 monomers in cross 
section) and experimental (fetal bovine aorta and nuchal ligament [NL]) axial mass 
distributions (Sherratt et al., 1997). (D) Schematic diagram of the intramolecular 
hinge model described in Baldock et al. (2001). The experimentally determined 
antibody binding positions are indicated on a schematic diagram of a microfibril. 
The position of a putative transglutaminase crosslink is represented as a black line. 


tissues, and two MAGP-1 molecules could bind at each bead (as observed 
by immunogold EM) (Henderson et al., 1996; Kielty and Shuttleworth, 
1997). 

The epitopes of the fibrillin-] monoclonal antibodies, 2502 and 11C1.3, 
were further characterized as shown in Fig. 1C. A library of overlapping 
recombinant fibrillin protein fragments (Rock et al., 2004) were screened 
by Western blot analysis. The epitope of antibody 2502 was found to be 
within exons 4-8 (amino acids 116-329) and the epitope of antibody 
11C1.3 was within exons 18-20 (amino acids 723-909). The position of 
these exons in the intramolecular hinge model is still consistent with 
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the experimentally determined location of the antibodies on microfibrils 
(Fig. 2D). 


B. Staggered Model 


A one-third staggered arrangement has been suggested by extrapolation 
of molecular dimensions (Downing et al., 1996) and organization from the 
crystal structure of fibrillin-1 domain arrays (Lee et al., 2004) (Fig. 3). Our 
calculations show that that this model would require ~24 laterally aligned 
staggered fibrillin-] molecules or the additional presence of ~80 MAGP-1 
molecules per repeat (for a microfibril with 8 monomers in cross-section) 
in order to correspond to real microfibril mass maps and mass contrast 
levels. However, available evidence to date (STEM, rotary shadowing, 
and quick-freeze-deep etch imaging of sonicated zonular microfibrils) 
indicates that there are only about eight molecules in a microfibril 
cross-section (Baldock et al., 2001; Wallace et al., 1991). Elasticity in the 
staggered model would arise from flexibility at specific TB-cbEGF-like 
domain junctions. However, in any one molecule, there is a maximum 
of two TB-cbEGF junctions per interbead repeat, irrespective of the pre- 
cise region of each staggered molecule within that interbead repeat. 
Therefore, assuming approximately equal extension of all laterally aligned 
molecules, that would give only up to ~10 nm extension per repeat. 
However, the region between the beads, which measures ~35 nm in 
untensioned microfibrils, has been seen to extend to ~150 nm, so it is 
difficult to see how such extension could be accounted for on the basis of 
a maximum of two flexible domain junctions per repeat. It is also not clear 
what molecular mechanism might stop molecules sliding apart in this 
staggered model. Crosslinks would be more critical for stabilizing this 
model. 

The refined epitope positions for antibodies 2502 and 11C1.3 were 
mapped onto the one-third stagger model and compared with the experi- 
mentally determined location on microfibrils. There is a discrepancy in the 
position of the epitope of 11C1.3 in this model. Antibody 11C1.3 binds to 
the arms/shoulder region of a microfibril at 41.1% of the bead-to-bead 
distance (Baldock et al., 2001). In the one-third stagger model, exons 18-20 
would be located very close to the bead, approximately 15 nm from their 
actual position (Fig. 3E). If the microfibril direction is reversed, then it is 
possible to find a better correlation between epitope position and the actual 
location on a microfibril. However, this model would place the N- and 
C-termini in the interbead region and the predicted mass profile would 
have a trough at the bead and a peak in the interbead (Fig. 3F). 
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E Microfibril Direction 


As outlined in 
Lee et al., 2004 


F Microfibril Direction 


fe, 


Fic. 3. TAMD calculated for staggered models of fibrillin-1 microfibril structure 
with 8 monomers in cross section. (A) Lee et al. (2004) proposed a one-third staggered 
model with a four-domain overlap (assuming axial domain dimensions of 3 nm) of the 
N/C termini. The 55-domain repeating unit can be aligned at one-third and one-half 
staggers to produce mass peaks with periodicities of 54 and 81 nm, respectively. TAMD 
(primary sequence + Ca?" + N-linked glycosylations + MAGP-1) calculated for (B) an 
approximately one-third stagger at domains 18, 19, and 20 and (C) for a 1/2 stagger at 
domain 28. (D) Comparison of theoretical (domain 19) and experimental (fetal bovine 
aorta and nuchal ligament) axial mass distributions (Sherratt et al., 1997). Theoretical 
mass distributions are depicted for 8 (TAMD domain 19), 16 (TAMD domain 19 x 2), 
or 24 (TAMD domain 19 x 3) monomers in crosssection. The total mass per repeat is 
923, 1846, or 2769 kDa for 8, 16, and 24 monomers, respectively. (E) Schematic 
diagram of the one-third stagger model described in Lee et al. (2004). The fibrillin 
molecule is colored as described above. Experimentally determined antibody epitope 
positions are indicated and compared with the predicted positions based on this model. 
There is a discrepancy with the position of antibody 11C1.3, the position in the model is 
approximately 15 nm away from the actual position on a microfibril. Also, the direction 
of the STEM mass map profile that was superimposed in Lee et al. (2004) was incorrect. 
It has been shown that the arm region (11C1.3 epitope) is in the shoulder of mass off 
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VI. FIBRILLIN MICROFIBRILS IN AGING AND DISEASE 


A. Aging 

Elastic fibers are remarkably resilient structural components of tissues, 
and elastin has a half-life of about 70 years. Nevertheless, normal elastic 
tissue aging is generally associated with a progressive loss of elasticity (e.g., 
hardening of the arteries, loss of skin elasticity and wrinkling, and reduced 
pulmonary capacity). These mechanical changes reflect structural degener- 
ative changes including denudation of the microfibrillar mantle surround- 
ing elastic fibers, and exposure ofthe elastin core to proteolytic degradation. 
In zonules, which hold the lens in dynamic suspension, reduced integrity of 
the zonular filaments based on fibrillin microfibrils leads to lens dislocation, 
and also contributes to farsightedness in later life. Photodamage to skin 
results in elastosis (appearance of large amounts of disordered elastin) and 
loss of the microfibrillar network integrity at the dermal-epidermal junction 
(Watson et al., 1999). This loss of integrity of the elastic fiber network—from 
the microfibrillar oxytalin fibers of the papillary dermis, to elaunin fibers 
associated with small amounts of elastin in the middermis, to thick elastic 
fibers of the reticular dermis—leads to loss of skin elasticity. 

Degenerative structural changes to elastic fibers are often caused by 
proteolysis through the actions of elastases (including serine proteinases, 
such as neutrophil elastase) and matrix metalloproteinases (including 
MMP-2, MMP-9 [gelatinases] and MMP-12 [metalloelastase]) (Ashworth 
et al., 1999c; Hindson et al., 1999; Kielty et al., 1994), but also by age-related 
glycation, accumulation of molecules such as amyloid P and fibronectin, and 
loss of components such as proteoglycans as tissue hydrated status is altered 
in aging. The proteolytic changes may result in direct loss of microfibrils or 
elastic fibers. Cell adhesion to microfibrils may also be altered. 


B. Fibrillin Heritable Diseases 


Mutations in fibrillin-1 cause the autosomal dominant disorder Marfan 
syndrome and related disorders, termed /ibrillinopathies (Robinson and 
Booms, 2001). MFS is characterized by life-threatening cardiovascular 


the bead (Sherratt et al., 2001). (F) Schematic diagram of a one-third stagger model 
with the microfibril direction reversed, and therefore correct with respect to the 
antibody positions and STEM mass data. This model has the N- and C-termini in the 
interbead and therefore would have little mass at the predicted bead and a peak of mass 
in the interbead instead. 
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disease and severe skeletal and ocular defects. Progressive aortic root 
dilatation, with risk of aortic dissection, is the most serious clinical prob- 
lem for most classic MFS patients. Aortic dissection, rupture, or cardiac 
failure due to mitral/aortic valve regurgitation cause death in over 90% of 
these patients. Neonatal MFS patients die perinatally of congestive heart 
failure and cardiac valve insufficiency. Other overlapping fibrillin-1 related 
pathologies include MASS syndrome (mitral valve prolapse, aortic dilata- 
tion, and skin and skeletal manifestations syndrome), and autosomal 
thoracic aortic aneurysms and dissections (TAAD) (Hasham et al., 2003). 


1. Fibrillin-1 Mutations 


Over 350 MFS mutations have been identified (Collod-Beroud et al., 
2003). Missense mutations represent approximately two-thirds of all fibril- 
lin-1 mutations, the majority affecting cbEGF domains. Mutations that 
change a cbEGF cysteine or calcium-binding consensus residue usually 
cause classic MFS, although phenotypic effects are variable and depend on 
location, calcium-binding characteristics, and domain context. Mutations 
affecting residues other than cysteines and calcium-binding residues are 
rarer, and include those that create a cysteine or N-glycosylation site, those 
that affect a conserved glycine involved in cbEGF domain-domain packing, 
and mutations of nonconserved residues. Premature truncation codon 
(PTC) mutations represent ~20% of FBN1 mutations. mRNAs carrying 
a PTC due to a frameshift or nonsense mutation are generally at reduced 
levels due to nonsense-mediated decay. Some PTCs cause mild disease, but 
others cause classic MFS. Splice site mutations represent ~12% of all 
mutations, are caused by exon skipping and activation of cryptic splice 
sites, and lead to inframe exon deletions and internally deleted fibrillin-1 
molecules. 


2. Genotype-Phenotype Relationships 


MFS has, at least in part, a dominant negative pathogenesis with mutant 
fibrillin-1 interfering with wild-type fibrillin-1 secretion, assembly, and/or 
function. Reduced levels of functional extracellular fibrillin-1 microfibrils 
impair vascular homeostasis. Genotype-phenotype relationships are poorly 
understood at the molecular level. Mutations occur throughout fibrillin-1 
with some correlations between location and disease severity, reflecting 
sequence-specific roles in secretion, assembly, and extracellular function. 
Localized mutation-specific effects also influence phenotype. Exons 24-32 
contain the mutations causing neonatal MFS and atypical severe MFS 
(suffer aortic dissection younger than 16 years of age) but also some 
mutations causing classic MFS. Mutation location is critical since similar 
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mutations in different domains can have profoundly different phenotypic 
outcomes, presumably reflecting different roles for specific sequences in 
microfibril assembly, structure, and function. Mutation-specific domain 
misfolding effects can also strongly influence phenotypic outcome. 


a. Dominant Negative Pathogenesis In MFS, the products of both wild- 
type and mutant fibrillin-1 alleles are coexpressed by cells, so mutant 
fibrillin-1 molecules can interfere with the secretion, assembly, and/or 
extracellular function of wild-type fibrillin-1. PTC mutations have reduced 
amounts of mutant transcript and milder phenotypes, supporting the 
hypothesis that, below a certain threshold of wild-type: mutant fibrillin-1 
(mutant FBN1, 6-16%), mutant molecules generally do not disrupt micro- 
fibrils to such an extent that severe disease results. However, a recent study 
using transgenesis has revealed that haploinsufficiency for wild-type fibril- 
lin-l, rather than production of mutant protein, could be the primary 
determinant of failed microfibrillar assembly (Judge et al., 2004). 


b. Impaired Tissue Homeostasis Fbn-1 gene-targeted mice defects imply 
that fibrillin-1 contributes vitally to vascular homeostasis. The collagen- 
rich adventitia sustains the bulk of hemodynamic stress, so aortic dilata- 
tion in MFS could reflect loss of adventitial tensile strength, possibly due 
to a direct (undefined) role for microfibrils in the adventitia or by altered 
medial elastic fibers affecting adventitial function. 


c. Increased Susceptibility to Proteolysis When calcium is bound, contig- 
uous cbEGF-like domain arrays adopt a stable rodlike form and microfi- 
brils exhibit 56 nm periodicity and well-defined organization. Bound 
calcium can protect fibrillin-1 from proteolysis at cryptic proteolytic cleav- 
age sites. We identified a disease-causing mutation (E2447K) that disrupts 
calcium binding and exposes a cleavage site for pathologically relevant 
enzymes MMP-9 and MMP-12 (metalloelastase) (Ashworth et al., 1999c). 
Others have monitored mutation-induced domain conformational 
changes by assessing exposure of cryptic trypsin cleavage sites. 


d Role in Regulating TGFB Activity Fibrillin-1 and microfibrils con- 
tribute to the regulated activation of members of the TGFß superfamily of 
cytokines (Charbonneau et al., 2004). TGFßs are cytokines that regulate 
many cellular processes including proliferation, cell cycle arrest, apoptosis, 
differentiation, and ECM formation, through a heterotrimeric complex of 
type I and type II receptors with serine-threonine kinase activities in their 
cytoplasmic domains. Selected manifestations of MFS, including progres- 
sive pulmonary disease, manifest excessive cytokine activation and signaling 
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(Neptune et al., 2003). Fbn-I-deficient mice have excessive TGF activity 
that probably underlies their tendency to develop emphysema and could 
explain other manifestations of MFS. Thus, perturbation of TGF@ signaling 
contributes to the pathogenesis of ECM disorders, but by a mechanism 
different from that reported in Fbn-1 deficient mice. Heterozygous 
TGFBR2 mutations can cause a second type of Marfan syndrome (MFS2; 
Mizuguchi et al., 2004). Four missense mutations have been identified in 
TGFBR2 in four unrelated probands, which lead to loss of TGF/ signaling 
activity on ECM formation. 


VII. SUMMARY 


In summary, fibrillin-] microfibrils are critically important structural 
and elastic polymers of the ECM, and are also key regulators of TGF 
activity in development. Recent molecular studies have provided new 
understanding of how they assemble in the pericellular environment, 
and testing of molecular alignment models is highlighting how their 
molecular packing arrangement endows them with their unique elastic 
properties. This information is contributing to our understanding of 
fibrillin-1 microfibril function in health and disease. 
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ABSTRACT 


Elastin is a key extracellular matrix protein that is critical to the elasticity 
and resilience of many vertebrate tissues including large arteries, lung, 
ligament, tendon, skin, and elastic cartilage. Tropoelastin associates with 
multiple tropoelastin molecules during the major phase of elastogenesis 
through coacervation, where this process is directed by the precise pattern- 
ing of mostly alternating hydrophobic and hydrophilic sequences that dic- 
tate intermolecular alignment. Massively crosslinked arrays of tropoelastin 
(typically in association with microfibrils) contribute to tissue structural 
integrity and biomechanics through persistent flexibility, allowing for 
repeated stretch and relaxation cycles that critically depend on hydrated 
environments. Elastin sequences interact with multiple proteins found in 
or colocalized with microfibrils, and bind to elastogenic cell surface recep- 
tors. Knowledge of the major stages in elastin assembly has facilitated the 
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construction of in vitro models of elastogenesis, leading to the identifica- 
tion of precise molecular regions that are critical to elastin-based protein 
interactions. 


I. ELASTIC FIBER 


The extracellular matrix imparts structural integrity on the tissues and 
organs of the body. It also acts as a dynamic modulator of a variety of 
biological processes. An important component of the extracellular matrix 
is the elastic fiber. Elastic fibers confer the properties of elastic recoil 
and resilience on all vertebrate elastic tissues, with the exception of 
lower vertebrates such as the lamprey (Debelle and Tamburro, 1999). 
Such properties are critical to the long-term function of these tissues. 
Elastic fibers are found within arteries, lung, skin, vertebral ligamenta flava, 
vocal chords, and elastic cartilage (Sandberg et al., 1981). They are com- 
posed of two morphologically and chemically distinct components-elastin 
and microfibrils. Elastin comprises approximately 90% of the elastic fiber 
and forms the internal core. It is interspersed with and surrounded by a 
sheath of unbranched microfibrils with an average diameter of 10-12 nm 
(Ramirez, 2000). The microfibrils are composed of a complex array of 
macromolecules. 

The structure of elastic matrices differs among tissues. Their function is 
a consequence of their composition and organization or architecture. In 
arteries, the elastic fibers are organized into concentric rings of elastic 
lamellae around the arterial lumen (Fig. 1A). Each elastic lamella alter- 
nates with a ring of smooth muscle cells (Li et al., 1998). In lung, elastic 
fibers form a delicate latticework throughout the organ, concentrating in 
areas of stress such as the opening of the alveoli and alveolar junctions 
(Fig. 1B). They provide the architectural foundation, as well as the stretch 
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Fic. 1. The arrangement of elastin in (A) aorta, (B) lung, and (C) ear cartilage. 


Reprinted with permission from (A) Biodidac, (B) Dr. Alan Entwistle, Ludwig Institute 
for Cancer Research, and (C) Dr. Gwen Childs. 
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and recoil required for normal function (Starcher, 2000). In ligaments 
and tendons, the fibers are oriented parallel to the basic tissue organiza- 
tion. In elastic cartilage (Fig. 1C), the fibers are organized into a large 
three-dimensional honeycomb configuration (Dietz et al., 1994). 


A. Elastin 


Elastin is an extremely durable, insoluble biopolymer that does not turn 
over appreciably in healthy tissue. It is estimated to have a half-life of 
about 70 years (Petersen et al., 2002). Elastin is formed through the lysine- 
mediated crosslinking of its soluble precursor tropoelastin. Tropoelastin is 
an approximately 60-70 kDa protein, whose length depends on alternate 
splicing. Tropoelastin exists as a monomer in solution in two forms: an 
open globular molecule and a distended polypeptide (Toonkool et al., 
2001a). Crosslinking of tropoelastin is initiated through the action of the 
enzyme lysyl oxidase (Kagan and Sullivan, 1982) or other family members 
such as the LOX-like proteins 1-4 (Liu et al., 2004; Noblesse et al., 2004). It 
is the physical properties of elastin that are considered principally respon- 
sible for the function of elastic tissues (Keeley et al., 2002). Elastin con- 
stitutes 30-57% of the aorta, 50% of elastic ligaments, 28-32% of major 
vascular vessels, 3-7% of lung, 4% of tendons, and 2-5% of the dry weight 
of skin (Vrhovski and Weiss, 1998). 


B. Microfibrils 

In developing elastic tissue, the microfibrils are the first components to 
appear in the extracellular matrix. They are then thought to act as a 
scaffold for deposition, orientation, and assembly of tropoelastin mono- 
mers. They are 10-12 nm in diameter, and lie adjacent to cells producing 
elastin and parallel to the long axis of the developing elastin fiber (Cleary, 
1987). 

Microfibrils are comprised of several macromolecules. Major components 
are the structural glycoproteins fibrillin-1 (Sakai et al., 1986), fibrillin-2 (Zhang 
et al., 1994), and microfibril-associated glycoprotein-1 (MAGP-1; Gibson et al., 
1986). The fibrillins are large (350 kDa), acidic, cysteine-rich glycoproteins 
that appear as extended, flexible molecules when viewed by electron 
microscopy. Their primary structure is dominated by calcium-binding 
epidermal growth factor-like repeats (Kielty ei al., 2002). MAGP-1 is an 
acidic, 31 kDa glycoprotein that is covalently linked to the microfibrils. 
Other proteins found associated with the microfibrils include vitronectin 
(Dahlback et al, 1990), latent transforming growth factor (binding 
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proteins (LTBPs; Kielty et al., 2002), emilin (Bressan et al., 1993; Mongiat 
et al., 2000), microfibrillar associated proteins (MFAP), and members of 
the fibulin family (Nakamura et al., 2002; Reinhardt et al., 1996; Roark et al., 
1995; Yanagisawa et al., 2002). 

Proteoglycans such as versican (Isogai et al., 2002), biglycan, and decorin 
(Reinboth et al., 2002) interact with the microfibrils. They confer specific 
properties including hydration, impact absorption, molecular sieving, 
regulation of cellular activities, mediation of growth factor association, 
and release and transport within the extracellular matrix (Buczek-Ihomas 
et al., 2002). In addition, glycosaminoglycans have been shown to interact 
with tropoelastin through its lysine side chains (Wu et al., 1999). 


II. ELASTOGENESIS 


In vivo elastin fiber formation requires the coordination of anumber of 
important processes. These include the control of intracellular transcrip- 
tion and translation of tropoelastin, intracellular processing of the pro- 
tein, secretion of the protein into the extracellular space, delivery of 
tropoelastin monomers to sites of elastogenesis, alignment of the mono- 
mers with previously accreted tropoelastin through associating microfibril- 
lar proteins, and finally, the conversion to the insoluble elastin polymer 
through the crosslinking action of lysyl oxidase (Fig. 2). 


A. Elastin Gene 


The human gene encoding tropoelastin is a single copy localized to 
7q11.2 region (Fazio, 1991). The primary transcript is approximately 40 kb 
in length and contains small exons interspersed between large introns 
giving rise to an unusually low exon/intron ratio (Piontkivska et al., 2004). 
This sequence codes for an mRNA of ~3.5kb (Parks and Deak, 1990), 
which consists of a ~2.2 kb coding segment and a relatively large, 1.3 kb 3’ 
untranslated region (Rosenbloom et al., 1991). The human tropoelastin 
gene contains 34 exons. Two further exons were lost in primate evolution 
through two steps, when exon 35 was probably deleted through inter-Alu 
recombination as Catarrhines diverged from Platyrrhines (New World 
monkeys). More recently, exon 34 was lost when Homo sapiens separated 
from the common ancestor shared with chimpanzees and gorillas. The 
additional exons persist in the bovine, porcine, feline, canine, and mouse 
genomes (Szabo et al., 1999). Tropoelastin is distinguished by an exon 
periodicity where functionally distinct hydrophobic and crosslinking do- 
mains are encoded in separate alternating exons (Fig. 3). All the exons 
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Fic. 2. Schematic representation of the classical model of elastogenesis. Tropoelastin 
is transcribed in mammals from a single gene and alternatively spliced in the nucleus. 
(A) Following translation and signal sequence cleavage, tropoelastin associates with EBP 
and FKBP65 in the rough endoplasmic reticulum. The tropoelastin-EBP complex then 
moves through the Golgi and is secreted to the cell surface. (B) Secreted tropoelastin is 
oxidized by a member of the lysyl oxidase family and tropoelastin associates with the 
microfibrils and with other tropoelastin molecules through coacervation to generate 
the nascent elastic fiber. (C) Continued secretion, oxidation, and depositing of 
tropoelastin occupy the bulk of elastin synthesis. The diagram is not drawn to scale. 
EBP, elastin binding protein; FKBP65, 65-kDa FK506 binding protein; MAGP, 
microfibril associated glycoprotein; LTBP, latent transforming growth factor -binding 
protein; MFAP, microfibril associated protein; LOXL, lysyl oxidase like. 


exist as multiples of three nucleotides and the exon-intron borders are 
always split in the same way (Rosenbloom et al., 1993). The first nucleotide 
of a codon is found at the 3 junction, while the second and third 
nucleotides are present at the 5’ border of exons. These exons are likely 
to have arisen through multiple intragenic duplication events (Kummer- 
feld et al., 2003). 

The primary transcript of tropoelastin undergoes extensive alternative 
splicing. As the splitting of codons at the exon-intron borders is consistent 
throughout the molecule, alternative splicing occurs in a cassette-like 
fashion with maintenance of the coding sequence (Bashir et al., 1989). 
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Fic. 3. Domain structure of human tropoelastin. Human tropoelastin consists of 34 
domains and is dominated by alternating hydrophobic and crosslinking regions. 


This results in the translation of multiple heterogeneous tropoelastin iso- 
forms (Indik et al., 1987). At least seven human exons are known to be 
alternatively spliced: 22, 23, 24, 26A, 30, 32, and 33 (Parks and Deak, 1990; 
Zhang M. C. et al., 1999b). Alternative splicing of individual exons may be 
used to tailor the structural function of the protein in different tissues 
(Piontkivska et al., 2004). It appears to be developmentally regulated and 
tissue-specific with age-related changes in isoform ratios observed in all 
species that have been investigated (Heim et al., 1991; Parks and Deak, 
1990; Starcher, 2000). The most frequently observed human tropoelastin 
isoform lacks exon 26A, which is reportedly only expressed in certain 
disease states (Debelle and Tamburro, 1999). Three human disorders have 
been linked to mutations or deletions of the tropoelastin gene: cutis laxa, 
supravalvular aortic stenosis, and Williams-Beuren syndrome. 


B. Regulation of Tropoelastin Expression 


Elastogenesis occurs primarily during late fetal and early neonatal 
periods. Elastin is synthesized and secreted from several cell types includ- 
ing smooth muscle cells, fibroblasts, endothelial cells, chondroblasts, and 
mesothelial cells (Uitto et al., 1991) with tissue-specific induction of elastin 
expression during development (Swee et al., 1995). After elastin has been 
deposited, its synthesis ceases and very little turnover of elastin is seen 
during adult life, unless the elastic fibers are subject to injury. In this case, 
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a program of neosynthesis of elastin can be rapidly activated (Davidson, 
2002). Both pre- and posttranscriptional control mechanisms have been 
described for tropoelastin expression. The 5/-flanking region of the elastin 
gene is typical of many housekeeping genes; it contains GC-rich islands, no 
consensus TATA box, and multiple ciselements that serve as potential sites 
for transcription-regulatory factors (Uitto et al., 1991). Exogenous factors 
that alter elastin transcription include nuclear factor-1 family members 
(NF-1; Degterev and Foster, 1999), insulin-like growth factor-1 (IGF-1; 
Conn et al., 1996; Wolfe et al., 1993), basic fibroblast growth factor (bFGF; 
Carreras et al., 2002; Rich et al., 1999), tumor necrosis factor-a (Kahari 
et al., 1992), interleukin 1 beta (Mauviel et al., 1993), and interleukin 10 
(Reitamo et al., 1994). 

Expression is extensively controlled at the posttranscriptional level 
(Parks, 1997), where mRNA decay is likely to be the predominant regu- 
latory mechanism. Tropoelastin pre-mRNA levels remain elevated in ma- 
ture adult lung, while steady-state mRNA levels are considerably reduced 
compared to those seen in neonatal lung (Swee et al., 1995). Transforming 
growth factor (TGF-G1) is a strong stimulator of elastin expression and is 
believed to affect elastin mRNA stability (Liu and Davidson, 1988). It is 
thought to reduce the binding of a cytosolic, trans-acting protein whose 
interaction with mRNA destabilizes elastin transcripts (Davidson, 2002), 
although it is possible that microRNA is involved. A sequence coded by 
exon 30 of tropoelastin confers transcript stability and responsiveness to 
TGF-@1 (Zhang M. et al., 1999a). 


C. Secretion 


After extensive splicing, the mature tropoelastin mRNA is exported from 
the nucleus. Translation occurs on the surface of the rough endoplasmic 
reticulum (rER) forming a 70-kDa polypeptide with an N-terminal signal 
sequence of 26 amino acids, which is cleaved as the protein enters the lumen 
of the rER (Grosso and Mecham, 1988). After release of the signal peptide, 
the protein travels through the lumen and is transported to the Golgi. 
Intracellularly, tropoelastin is likely to be chaperoned by a 67-kDa elastin- 
binding protein (EBP) that prevents intracellular self-aggregation and 
premature degradation (Hinek et al., 1995). The EBP is an enzymatically 
inactive spliced variant of -galactosidase and binds predominantly to 
hydrophobic domains on elastin (Hinek, 1995). FKBP65 is a peptidyl-prolyl 
cis/trans isomerase that also associates with tropoelastin in the secretory 
pathway (Patterson et al., 2002). 


444 MITHIEUX AND WEISS 


D. Incorporation of Tropoelastin into the Elastic Fiber 


After secretion, the EBP is likely to deliver tropoelastin to the micro- 
fibrillar scaffold site of fiber formation. The binding of microfibrillar 
galactosugars to the EBP results in the release of tropoelastin (Privitera 
et al., 1998). Here it interacts with the microfibrils, which align the tropo- 
elastin monomers, for subsequent formation of the elastic fiber. The 
formation of a specific transglutaminase crosslink between fibrillin-1 and 
tropoelastin may act to covalently stabilize the newly deposited tropoelastin 
on the microfibrils (Rock et al., 2004), which is facilitated by calcium- 
dependent binding of MAGP-1 to multiple sites within tropoelastin (Clarke 
and Weiss, 2004). The C-terminal region of tropoelastin plays a pivotal role 
in the ordered deposition of the monomer into the growing elastin poly- 
mer (Brown-Augsburger et al., 1996; Hsiao et al., 1999; Kozel et al., 2004). 
The EBP is recycled back to the intracellular endosomal compartments 
for reassociation with newly synthesized tropoelastin (Hinek et al., 1995). 
The two major processes required for the incorporation of tropoelastin 
into the growing elastic fiber are coacervation and crosslinking. 


E. Coacervation 


In order for tropoelastin molecules to be crosslinked, they must first 
associate and align so as to facilitate the generation of crosslinks between 
closely spaced lysines. Coacervation is proposed to be the molecular mecha- 
nism through which aligning and concentrating can occur (Urry, 1978). 
Coacervation refers to an inverse temperature transition whereby tropoelastin 
molecules aggregate with increasing temperature. It is an entropically driven 
reversible process that involves interactions between the hydrophobic do- 
mains of tropoelastin. At low temperatures, tropoelastin is soluble in 
solution; on raising the temperature, the solution becomes cloudy as the 
tropoelastin molecules aggregate and become ordered by interactions 
between hydrophobic domains, such as the oligopeptide repetitive se- 
quences GVGVP, GGVP, and GVGVAP (Vrhovski and Weiss, 1998). If left 
to settle, a viscoelastic phase forms containing highly concentrated tro- 
poelastin. This phenomenon is explained in terms of the effects of water 
on the tropoelastin molecule. At low temperatures, water forms a clath- 
rate-like structure around the hydrophobic regions of tropoelastin, 
keeping the protein unfolded. With increasing temperature, the ordered 
clathrate water is disrupted, rendering the hydrophobic domains free to 
fold and interact with other hydrophobic segments. Although order in 
the protein increases, the overall entropy of the system is increased 
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through the disruption of the water (Urry and Long, 1977; Urry et al., 
1969, 1974). 

The process of coacervation is finely tuned to the physiological condi- 
tions of the extracellular matrix. Optimal coacervation of human tropo- 
elastin occurs at 37°C, 150 mM NaCl, and pH 7-8 (Vrhovski et al., 1997). 
The arrangement of sequences in tropoelastin is critical to this process 
of coacervation, where association through hydrophobic domains depends 
on their contextual location in the molecule (Toonkool et al., 2001b). 
Tropoelastin association rapidly proceeds through a monomer to n-mer 
transition, with little evidence of intermediate forms (Toonkool et al., 
2001a). 


F. Crosslinking 


Tropoelastin molecules are crosslinked in the extracellular space through 
the action of the copper-dependent amine oxidase, lysyl oxidase. Specific 
members of the lysyl oxidase-like family of enzymes are implicated in this 
process (Liu et al., 2004; Noblesse et al., 2004), although their direct roles are 
yet to be demonstrated enzymatically. Lysyl oxidase catalyzes the oxidative 
deamination of eamino groups on lysine residues (Kagan and Sullivan, 
1982) within tropoelastin to form the a-aminoadipic-6-semialdehyde, 
allysine (Kagan and Cai, 1995). The oxidation of lysine residues by lysyl 
oxidase is the only known posttranslational modification of tropoelastin. 
Allysine is the reactive precursor to a variety of inter- and intramolecular 
crosslinks found in elastin. These crosslinks are formed by nonenzymatic, 
spontaneous condensation of allysine with another allysine or unmodified 
lysyl residues. Crosslinking is essential for the structural integrity and func- 
tion of elastin. Various crosslink types include the bifunctional crosslinks 
allysine-aldol and lysinonorleucine, the trifunctional crosslink merodes- 
mosine, and the tetrafunctional crosslinks desmosine and isodesmosine 
(Umeda et al., 2001). 

Two types of crosslinking domains exist in tropoelastin: those rich in 
alanine (KA) and those rich in proline (KP). Within the KA domains, 
lysine residues are typically found in clusters of two or three amino acids, 
separated by two or three alanine residues. These regions are proposed to 
be a-helical with 3.6 residues per turn of helix, which has the effect of 
positioning two lysine sidechains on the same side of the helix, although 
there is no direct structural evidence (Brown-Augsburger et al., 1995; 
Sandberg et al., 1971), and facilitating the formation of desmosine cross- 
links. Desmosine crosslinks are formed by the condensation of two allysine 
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residues on one tropoelastin and one allysine residue and one lysine 
residue on another molecule. In human tropoelastin, the KA domains 
are encoded by exons 6, 15, 17, 19, 21, 23, 25, 27, 29, and 31. The KP 
domains are encoded by exons 4, 8, 10, and 12. In these KP domains, the 
lysine pairs are separated by one or more proline residues and are flanked 
by prolines and bulky hydrophobic amino acids (Brown-Augsburger et al., 
1995). Desmosines and isodesmosines have not been found in association 
with KP domains. This is likely to be due to the steric constraints imposed 
by the presence of multiple proline residues that would not allow the 
formation of a-helical structure. Lysine residues are also found in domains 
13, 14, and 36. 

There is little information describing which lysine residues are involved 
in crosslink formation. This is largely due to the highly insoluble nature of 
elastin, making it difficult to analyze. However, one elastin crosslinking 
domain has been identified that joins three tropoelastin chains identified 
as 10, 19, and 25 (Brown-Augsburger et al., 1995). The only two KA 
domains that contain three lysine residues are encoded by exons 19 and 
25. They form a desmosine crosslink in an antiparallel arrangement of 19 
and 25, while the remaining lysine on each chain forms a lysinonorleucine 
crosslink with two lysine residues present on the KP domain encoded by 
exon 10. 


Ill. PROPERTIES OF ELASTIN 


A. Physical Properties 


Purified elastin is pale yellow and has a characteristic blue fluorescence 
in ultraviolet light (Partridge, 1962). When dry, it is a hard, brittle glassy 
solid. On wetting, it becomes flexible and elastic. The water content of 
elastin is affected by temperature; a large increase is seen in the swollen 
volume of elastin with decreasing temperature (Lillie and Gosline, 2002). 
At 36°C, purified bovine ligamentum nuchae (which is primarily composed 
of elastin) contains 0.46 g water/g protein; at 2°C, this increases to 0.76g 
water/g protein (Gosline, 1978; Gosline and French, 1979). 

Young’s Modulus provides a measure for the elasticity or stiffness of a 
material (i.e., the greater the Young’s Modulus, the stiffer the material). 
The Modulus is determined from the slope of a stress/strain curve. The 
slope of this curve for elastin remains linear to an extension of ~70% 
(Gosline, 1976). The Young’s Modulus of elastic fibers is 300-600 kPa; for 
collagen, it is 1 x 10°kPa. The maximum extension of elastic fibers ranges 
from 100-220% (Fung, 1993). Elastic fibers are able to undergo billions of 
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cycles of extension and recoil without mechanical failure (Keeley et al, 
2002). 


B. Structural Properties 


Numerous studies have been undertaken to elucidate the second- 
ary structure of soluble elastin. These studies have been performed on 
elastin, elastin solubilized by oxalic acid (a-elastin) or potassium hydroxide 
(«-elastin), synthetic polypeptide models of elastin, and tropoelastin. 
Techniques used include circular dichroism, FT-Raman, and electron 
microscopy. No consensus has been reached on the overall structure of 
elastin. 

Circular dichroism (CD) studies on a-elastin (Tamburro et al., 1977), 
«elastin, bovine, and human tropoelastin (Debelle et al., 1995; Vrhovski 
et al., 1997) have demonstrated a conformational transition to increased 
a-helical content with increasing temperature. The a-helical content pre- 
dicted for tropoelastin is probably confined to the crosslinking domains, 
as the rest of the molecule is rich in helix breaking proline residues 
(Muiznieks et al., 2003). 

Urry et al. (1974) proposed that the hydrophobic regions of elastin 
containing the repeat peptides VPGVG, VPGG, and APGVGV form ß-spirals 
on coacervation of the molecule. (@-spirals contain recurring type II @-turns. 
Using CD analysis, Jensen et al. (2000) showed that domain 26 of human 
tropoelastin contained f-structure. Tamburro and colleagues have de- 
monstrated that the hydrophobic domains of human tropoelastin may 
partly assume polyproline II (PPH) structures (Bochicchio et al., 2004). 
Tropoelastin is likely to be a dynamic structure, as PPII does not contain 
obvious intramolecular hydrogen bonds and is likely to be in equilibrium 
with multiple conformations. 

CD analysis of recombinant human tropoelastin shows that the molecule 
is composed of 3% a-helix, 41% ß-sheet, 21% Gm, and 33% other 
structure (Vrhovski et al., 1997). FT-Raman studies on human elastin dem- 
onstrate derived secondary structures containing 8% a-helix, 36% (-strand, 
and 56% unordered conformation (Debelle et al., 1998). 

Macroscopically, elastin appears to be an amorphous mass. Ultrastruc- 
tural electron microscopy studies reveal that elastin has a fibrillar substruc- 
ture comprised of parallel-aligned ~5nm thick filaments that appear to 
have a twisted ropelike structure (Gotte et al., 1974; Pasquali-Ronchetti 
et al., 1998). A variety of techniques have been used to resolve these 
filaments, including negative staining electron microscopy of sonicated 
fragments of purified elastic fibers (Serafini-Fracassini et al., 1976), freeze 
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fracture electron microscopy techniques on bovine ligamentum nuchae 
(Pasquali-Ronchetti et al., 1979), ultrathin cryosections of natural fresh 
elastic fibers (Fornieri et al., 1982), and aggregated tropoelastin molecules 
(Bressan et al., 1986). Scanning force microscopy also confirmed the 
filamentary network organization of elastin. These results suggest that 
crosslinked elastin molecules exhibit globular domains preferentially 
linked in one direction to form filaments and laterally joined at regular 
intervals (Pasquali-Ronchetti et al., 1998). 


C. Biological Properties 


The key biological function of elastin is to impart elasticity to organs and 
tissues. However, studies have demonstrated that elastin and elastin pep- 
tides have diverse biological properties (Faury et al., 1998). Evidence for 
both direct and indirect elastin-mediated cell signaling has been demon- 
strated. Elastin regulates arterial and lung terminal airway branching 
morphogenesis (Li et ol, 1998; Wendel et al, 2000). Elastin receptors 
include the elastin-laminin receptor (ELR; Hinek et al., 1988), which relies 
on an EBP subunit to facilitate association, and integrin avß3 (Rodgers 
and Weiss, 2004). The ELR mediates the regulation of skin fibroblast 
proliferation (Groult et al., 1991), chemotaxis for monocytes and fibro- 
blasts (Indik et al., 1990; Senior et al., 1980, 1982), age-dependent vasodi- 
lation in aortic rings (Faury et al, 1995), endothelium-dependent 
vasorelaxation (Faury et al., 1998), inhibition of the migratory response 
of smooth muscle cells (SMC) to chemoattractants (Ooyama et al., 1987), 
regulation of SMC proliferation (Ito et al., 1998), and the increase of Cal" 
levels in leukocytes and endothelial cells (Faury et al., 1998; Varga et al., 
1989). Elastin-derived protein coating of a poly(ethylene terephthalate), 
commonly used in biomaterials, promotes the growth, proliferation, and 
phenotype maintenance of endothelial cells (Dutoya et al., 2000). 

The repetitive mechanical deformation in arterial tissues affects phenotype 
and proliferation of smooth muscle cells (Birukov et al., 1995). The physio- 
logical force of wall stress isa key determinant of arterial development (Li etal., 
1998) and cyclic stretching stimulates the synthesis of extracellular matrix 
components by arterial smooth muscle cells in vitro (Leung et al., 1976). 
Furthermore, the development of mechanically functional small-caliber au- 
tologous arteries by seeding biodegradable polyglycolic acid (PGA) tube 
scaffolds with SMCs on the exterior and endothelial cells on the interior 
have been limited due to a lack of elastin deposition (Niklason et al., 1999). 
These data collectively point to the importance of an exogenous supply of 
elastin-like materials in biomaterial constructs (van Hest and Tirrell, 2001). 
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IV. MECHANISM OF ELASTICITY 


The fundamental driving force behind the remarkable elastic properties 
of the elastin polymer is believed to be entropic, where stretching de- 
creases the entropy of the system and elastic recoil is driven by a sponta- 
neous return to maximum entropy. The precise molecular basis for 
elasticity has not been fully elucidated and a number of models exist. 
Two main categories of structure-function models have been proposed: 
those in which elastin is considered to be isotropic and devoid of structure, 
and those which consider elastin to be anisotropic with regions of order 
(Vrhovski and Weiss, 1998). 


A. Random Chain Model 


This model is based on classic rubber theory, suggesting that elastin is 
made up of a network of random chains that are kinetically free and exist 
in a high entropic state. Stretching orders the chains and limits their 
conformational freedom, thus decreasing the overall entropy of the system 
(Hoeve and Flory, 1974). This provides the restoring force to the relaxed 
State. 


B. Liquid Drop Model 


In this model, the hydration of hydrophobic side chains of tropoelastin, 
which become exposed to solvent on extension, is implicated in the 
stretch-induced decrease in entropy (Urry et al, 2002; Weis-Fogh and 
Andersen, 1970). Weis-Fogh and Andersen (1970) proposed that globular, 
spherical monomers (oil droplets) of tropoelastin are crosslinked to form 
a three-dimensional aggregate. Deformation of the matrix exposes the 
interior nonpolar groups to water. Gosline (1980) suggested a large 
hydrophobic contribution to the stored elastic energy; however, a random 
network arrangement of tropoelastin is implied in this model. 


C. Oiled Coil Model 


This model is similar to the liquid drop model but regards the tropo- 
elastin monomer as fibrillar. It features a broad coil made up of the 
repeating tetrapeptide units GVPG in which glycine residues occupy the 
exterior portions exposed to solvent, while proline, valine, and other hydro- 
phobic residues are buried. Each monomer is fibrillar, made up of alternat- 
ing sections of crosslink regions and “‘oiled coils” (Gray et al., 1973). The 
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crosslink regions are rigid, while the oiled coil hydrophobic regions are 
flexible. On extension, as for the liquid drop model, the hydrophobic 
interior is exposed to water providing an entropic change in the system. 


D. Fibrillar Model 


The fibrillar model (Urry et al., 1974) proposes that elasticity arises from 
the properties of regular (-spiral structures comprising repetitive @-turns. 
These stable -turns are formed by regularly repeating sequences (e.g., 
VPGVG, VPGG, and APGVGV) that are found in tropoelastin. The Oms 
act as spacers between the turns of the /-spiral, suspending chain segments 
in a conformationally free state. On stretching, the liberational entropy in 
these peptide segments is reduced, which provides the restoring force 
required for elasticity. 

Tamburro et al. (1991) also proposed a -turn based model, where the /- 
turns are situated in the GXGGX repeating sequences. These glycine turns 
are considered to be more labile than proline counterparts and able to 
interconvert, giving rise to dynamic -turns sliding along the protein chain 
(Debelle and Tamburro, 1999). In this model, the polypeptide chain is 
freely fluctuating, and the system is essentially unstructured. The entropic 
restoring force is considered to be similar to that proposed by the random 
chain model. 


V. BIOMATERIALS 


In healthy individuals, the mature elastin molecule is a stable, insoluble 
protein. Degradation of elastin is extremely slow due to the extensive 
crosslinking of tropoelastin within the elastic fiber. However, with aging, 
injury, or the onset of a variety of acquired diseases, the degradation and 
excessive or aberrant remodeling of elastic fibers becomes apparent in 
arteries, lung, skin, and ligament (Osakabe et al., 2001). The correct 
assembly of the elastin polymer is critical for proper elastic function, 
and therefore the physiological properties of these tissues become com- 
promised. In aortic valves, damage to elastin leads to a passive elongation 
of the tissue, a reduction in extensibility, and an increase in stiffness (Lee 
T. C. et al., 2001b). Examples of common disorders that involve abnormal 
elastic fiber assembly and/or degradation include aortic aneurysms, ath- 
erosclerosis, chronic obstructive pulmonary disease, emphysema, and 
hypertension (Urban and Boyd, 2000). In addition, a range of genetic 
disorders can affect the elastic fiber system, leading to skeletal and skin 
abnormalities and vascular defects (Milewicz et al., 2000). 
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Elastic fiber degradation due to age or disease is caused by the action 
of a group of proteolytic enzymes, known as elastases (Faury, 2001). 
Elastolytic enzymes belong to four classes of proteases: serine, aspartic, 
cysteine, and metalloproteinases (Jacob et al., 2001). Accompanying the 
degradation of elastin is the presence of a significant amount of degrada- 
tion products in the tissues and circulation system. These elastin pep- 
tides can induce diverse biological responses, including extracellular 
matrix protein synthesis and deposition, cell attachment, migration, and 
proliferation. 

The need to repair or replace damaged tissues and organs is being 
increasingly met through the development of innovative biomaterials. 
Ideal implants should mimic the native cellular environment and possess 
appropriate mechanical properties, such as strength and elasticity. Conse- 
quently, extracellular matrix engineering has become a major area of 
interest in the field of tissue repair (Stock et al., 2001). Almost all somatic 
cells are in contact with the complex network of macromolecules that 
make up the extracellular matrix. This matrix is particularly plentiful in 
connective tissues where cells are only sparsely distributed (Sittinger et al., 
1996). Elastin-mediated elasticity is crucial for the normal function of skin, 
lung, bladder, and arteries. 

Multiple approaches to tissue replacement and guided tissue regen- 
eration are currently being used. These include using autografts, 
allografts, synthetically or naturally derived biomaterials, and cell:matrix 
constructs where cells are placed on or within the matrix (Fuchs et al., 
2001). Regardless of the approach, the necessary criteria for a suitable 
tissue replacement system are that it is biocompatible, bioactive, and 
biofunctional. 

The requirement for elastic behavior in many replacement tissues, and the 
diverse biological properties elicited by elastin, have led to increased interest 
in the development of artificial biodegradable elastomers such as hydro- 
gels (Hoffman, 2001), elastin-peptide based or elastin-mimetic polymers 
(Cappello et al, 1998; Dutoya et al., 2000; Huang et al., 2000; Keeley et al., 
2002; Martino and Tamburro, 2001; Martino et al, 2002; McMillan and 
Conticello, 2000; Panitch et al., 1999; Rovira et al., 1996; Urry et al, 1998), 
and devitalized exogenous elastin matrices from bovine, porcine, and canine 
sources (Goissis et al., 2000; Kajitani et al., 2000, 2001; Vardaxis et al., 1996). 
Proposed uses for elastin-based materials include prostheses for vascular 
walls, calcifiable matrices that could induce ossification in areas where bone 
formation is desired (Urry, 1978; Vardaxis et al., 1996), arterial graft coating 
to exploit the antiproliferative effect on smooth muscle cells (Ito et al., 1998), 
matrices for in situ formation of cartilaginous tissue (Betre et al., 2002) and 
targeted drug delivery (Cappello et al., 1998; Chilkoti et al., 2002). 
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Elastomers are commonly used in applications that require compliance 
with soft or cardiovascular tissues (Peppas and Langer, 1994). Hydrogels 
are polymer networks that may absorb from 10-20% up to thousands of 
times their dry weight in water (Hoffman, 2001). Elastomeric polypeptides 
crosslinked into matrices absorb 400-600% of their dry weight in water 
(Lee J. ef al., 2001a). Hydrogels are finding increased use in medicine and 
biomedical engineering because of their capacity for rapid solute transfer 
and physical resemblance to living tissues (Martin et al., 1998). Investiga- 
tions into the use of stimuli-sensitive hydrogels that sense environmental 
change, causing an induction of structural change to the material, are 
mounting due to their potential applications in the development of 
biomaterials and drug delivery systems (Miyata et al., 2002). 

The first elastin-based polymers contained the polypeptide sequence 
thought to be involved in the elastic properties of elastin: poly(GVGVP). 
These polymers were shown to exist in hydrogel, elastic, and plastic states 
(Guda et al., 1995). More recently, polymers that contain a variety of 
subdomains of elastin based on this sequence have been shown to exhibit 
self-assembling properties and lower critical solution temperature behav- 
ior. They are referred to as stimuli responsive polymers (Nath and Chilkoti, 
2002). While these polymers display worthwhile mechanical properties, 
they lack many other elastin sequences of biological relevance. Conse- 
quently, they require the incorporation of epitopes that are directly bound 
by cellular adhesion receptors, such as RGD (Kobatake et al., 2000; Nicol 
et al., 1992) or REDV (Panitch et al., 1999), which stimulate endothelial 
cell binding. These approaches are being increasingly investigated as they 
are expected to encourage cellular infiltration and allow for remodeling 
and incorporation of elastin-based polymers as living tissue (Kajitani ei al., 
2001). Remarkably, through its natural C-terminus, tropoelastin is now 
appreciated to bind directly to integrin av/3, which raises the opportunity 
of using elastin sequences instead of hybrid constructs to facilitate cell 
interactions (Rodgers and Weiss, 2004). 

While the use of exogenous, devitalized elastin purified from animal 
sources may be limited due to problems associated with prion-mediated 
spongiform encephalopathies such as Jakob-Creutzfeldt disease, inves- 
tigations into this source of elastin are still being considered. Partially 
devitalized collagen and elastin matrices purified from blood vessels ob- 
tained from beagle dogs are being examined for their potential use in 
small-diameter vascular grafts (Goissis et al., 2000). An acellular elastin 
patch made from porcine aorta has been used to successfully repair 
esophageal and duodenal injuries in pigs (Kajitani et al., 2001, 2000). 
Elastin-solubilized peptides from bovine ligamentum nuchae have been 
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chemically bonded to Dacron (polyethylene terephthalate) to elicit 
biological activity and improve compatibility (Bonzon et al., 1995). 

An elastic polymer matrix was made using as few as three hydrophobic 
domains flanking two crosslinking domains (Bellingham et al., 2003). 
Recently, a synthetic elastin (Mithieux et al., 2004) was made in vitro 
through the chemical crosslinking of recombinant human tropoelastin 
(Martin et al., 1995). The synthetic elastin displays physical, structural, and 
biological properties similar to those of naturally occurring human elastin. 
These findings define the in vitro system as a useful model for elastogen- 
esis. Synthetic elastin has potential as a novel biomaterial that is easily 
manufactured, can be molded into a variety of shaped tissue substrates, 
and has a range of properties that are required for elastic, compliant, cell- 
interacting, and medically relevant applications. Knowledge of many of 
the molecules that interact with tropoelastin and elastin, combined 
with the development of in vitro model elastogenic systems, demonstrate 
the versatility of elastin and facilitate the development of a new generation 
of biomaterials based on synthetic human elastin that can be considered as 
scaffolds for the augmentation and repair of human tissue. 
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TGFß and, 443 
transcription in, 441-442 
tropoelastin and, 437, 439, 440, 
443, 444, 445 
Elastin-binding protein (EBP) chaperone 
elastin and, 443, 444, 448 
Elastin-laminin receptor (ELR) 
elastin and, 448 
Elastogenesis 
elastin and, 437-438, 440-446, 453 
Electron microscope (EM) analysis 
of elastin, 447-448 
Electron microscopy (EM) analysis 
of collagen fibrils, 344, 346, 
349, 362, 364 
collagen triple helix and, 326-327 
of fibrillin microfibrils, 417 
fibrinogen/fibrin and, 251, 254, 265, 268 
of network-forming collagens, 
378, 380, 388 
sequence repeats and, 25 
ELR. See Elastin-laminin receptor 
EM. See Electron microscopy analysis 
Endostatin 
collagen triple helix and, 303-304 


Envoplakin/periplakin 
IFAP structure/function and, 157-158 
Epiplakin 
IFAP structure/function and, 149 
sequence repeats and, 13 
Evolution 
of fibrinogen/fibrin, 283-284 
Expression 
of elastin, 441-443 
Extracellular matrix assemblies 
fibrillin microfibrils and, 405 


FACIT collagens 
collagen fibrils and, 342-343, 345, 357 
collagens and, 7-8 
Factor XIIa 
fibrinogen/fibrin and, 248, 257-258, 263, 
266, 270-271, 274, 276, 284 
Fiber diffraction analysis 
of collagen triple helix, 302, 307, 308 
Fibril surfaces 
collagen fibrils and, 356-357 
Fibril-forming collagens 
collagen fibrils and, 342-343 
Fibrillar model 
of elastin, 449-450 
Fibrillin 
fibrous proteins and, 1, 8, 9, 10 
Fibrillin microfibrils 
AFM and, 411, 417, 420 
aging and, 426 
assembly of, 412-417 
beads-on-a-string and, 405, 406, 407 
calcium chelation in, 419 
cbEGF domains in, 407, 408-409, 410, 
411, 415, 420, 427, 428 
crosslinking in, 415, 417, 420, 423 
domains of, 406-407 
elasticity of, 405-406, 419-420, 423 
EM of, 417 
as extracellular matrix assemblies, 405 
fibrillin-1 homotypic interactions and, 
412-414 
fibrillinopathies of, 426-429 
fibrillogenesis and, 406 
flexibility in, 409, 411 
folding in, 412 
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Fibrillin microfibrils (cont.) Fibrin polymerization 
glycosylation of, 406-407, 411, 412, 413 fibrinogen/fibrin and, 248-249, 263, 
hinged alignment model in, 420-423 264-269, 270 
HSPGs and, 412 Fibrin/fibrinogen 
hybrid motifs and, 410 fibrous proteins and, 5-6 
hydrophilicity and, 411 Fibrinogen 
hydrophobicity and, 410, 411 coiled coil structures and, 71-72 
integrins and, 409-410, 412 sequence repeats and, 20-21 
linkers and, 409 Fibrinogen/fibrin 
LTBPs and, 415-416 afibrinogenemias in, 281 
MAGP interactions and, albumin and, 273-274 
414-416, 421-422 binding pockets and, 269 
mass spectroscopy studies of, 411 biosynthesis of, 260-262 
maturation of, 417, 420 calcium binding and, 247, 257, 259-260, 
MFAP interactions and, 414-416 269, 270, 277, 284 
MFS and, 405, 410, 427-429 carcinomas and, 261, 282, 283 
molecular organization of, 411 cell adhesion and, 274, 278, 285 
mutations of, 406, 409, 427-429 chain assembly in, 261-262 
N-/C- termini and, 410-411, 413, 414, 423 cirrhosis, 256 
NMR studies of, 408 clotting and, 248, 250, 256-257, 263, 264, 
PDI and, 411 268-269, 270, 272-273, 277-278, 280 
periodicity and, 419 coiled coils and, 256, 258, 259 
proline/glycine-rich regions and, 410 crosslinking in, 247, 257, 266, 
proteoglycans and, 412, 416, 426 270-272, 284 
recombinant studies of, 413 cytokines and, 261 
RGD integrin recognition site and, desialylation of, 256 
409-410, 415 diabetes and, 283 
staggered alignment model in, domains of, 253-256, 264, 265, 270, 272 
420-423, 424-426 dysfibrinogenemias in, 264, 279-283, 284 
staggering in, 419, 420 EM analysis of, 251, 254, 265, 268 
STEM mass mapping and, evolution of, 283-284 
417, 418, 419, 423 factor XIIa and, 248, 257-258, 263, 266, 
TAMD and, 422, 425 270-271, 274, 2776, 284 
tandem repeats of, 408 fibrin polymerization and, 248-249, 263, 
TB domains in, 407 264-269, 270 
TB motifs in, 407-408, 409-410, 420 fibrinolysis and, 248-250, 264, 275-277 
TGFß and, 409-410, 415-416, 428-429 fibrinopeptides and, 247-248, 251, 253, 
tropoelastin and, 415, 416, 421-422 260, 263-264, 267, 270, 
x-ray analysis of, 408, 419 278-279, 280-281 
Fibrillin-1 homotypic interactions fibroblast growth factor-2 and, 274-275 
fibrillin microfibrils and, 412-414 fibroblasts and, 277 
Fibrillinopathies fibronectin and, 271, 273 
of fibrillin microfibrils, 426-429 fibulin and, 274 
Fibrillogenesis glycoprotein structure of, 247, 248 
fibrillin microfibrils and, 406 glycosylation of, 256-257 
Fibrils, type I collagen-rich. See also hydrophobicity and, 256, 259, 263 
Collagen fibrils integrins and, 248, 273-274, 277-279, 285 
collagen fibrils and, 343-344 interleukins and, 274-275 
Fibrils, type II collagen-rich knobs/holes in, 248, 254, 257, 


collagen fibrils and, 343-344 258, 264, 284 
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leukocytes and, 279 
Lp(a) and, 276-277 
microcalorimetry of, 254-255 
mutagenesis of, 260, 264, 279-281 
physico-chemical properties of, 
251-253, 259 
plasminogen and, 248-250, 275, 277 
platelets and, 249-250, 262, 274, 
277-279, 281, 285 
protofibrils and, 266, 267-268, 273 
signaling and, 260-261 
sources of, 262 
structure/properties of, 251-260 
TAFI and, 274-275 
thrombin and, 247, 258, 263-264, 277 
thrombospondin and, 274 
tPA and, 275-277 
transgenic mice and, 278, 279 
vascular endothelial growth factor and, 
274-275 
von Willebrand factor and, 274 
x-ray analysis of, 248, 251, 255, 256, 
257-259, 265, 270, 284 
Fibrinolysis 
fibrinogen/fibrin and, 248-250, 
264, 275-277 
Fibrinopeptides 
fibrinogen/fibrin and, 247-248, 
251, 253, 260, 263-264, 267, 
270, 278-279, 280-281 
Fibroblast growth factor-2 
fibrinogen/fibrin and, 274-275 
Fibroblasts 
fibrinogen/fibrin and, 277 
Fibronectin 
fibrinogen/fibrin and, 271, 273 
Fibrous long spacing (FLS) fibrils 
collagen fibrils and, 355-356 
Fibrous proteins, 1, 2. See also Collagens; 
Sequence repeats, in fibrous proteins 
actinins and, 1, 5, 10 
amyloids and, 2 
coil-coiled proteins and, 1, 2, 3-4, 5, 8 
collagens and, 1, 2, 6-8 
connective tissue and, 6-9 
dystrophin/MS and, 5, 10 
EF hands and, 5, 27-28 
elastic fibers in, 1, 5, 6, 8 
elastins and, 1, 2, 9, 10 
fibrillin, 1, 8, 9, 10 


fibrin/fibrinogen and, 1, 2, 5-6 
g-fibrous proteins and, 2-6, 10, 12 
filaments and, 1, 2 
IFAPs and, 4-5 
IFs and, 4-5 
keratins and, 4 
NMR and, 2, 5, 9 
prions and, 2 
proteoglycans and, 6 
repetitive motifs/heptads and, 2-3 
spectrins and, 1, 5, 10 
tropoelastin and, 8-9 
tropomyosin and, 3 
x-ray crystallography of, 2, 5-6, 9 
g-fibrous proteins 
fibrous proteins and, 2-6, 10 
sequence repeats and, 14 
Fibulin 
fibrinogen/fibrin and, 274 
Filaggrin 
IFAP structure/function and, 168-169 
sequence repeats and, 17 
Filament formation 
fibrous proteins and, 12 
sequence repeats and, 12, 20-21 
Filaments, beaded 
network-forming collagens and, 376 
Fimbrin 
IFAP structure/function and, 172 
Flexibility 
of elastin, 437, 448, 449-450 
in fibrillin microfibrils, 409, 411 
of IF chains, 124-125, 127-128 
of network-forming collagens, 377, 399 
spectrin superfamily and, 
215, 217, 219 
Flp. See Fluoroproline 
FLS. See Fibrous long spacing fibrils 
Fluoroproline (Flp) 
collagen triple helix and, 322-323, 332 
Focal contacts 
IFAP structure/function and, 162-163 
spectrin superfamily and, 211 
Focal segmental glomerulosclerosis (FSGS) 
spectrin superfamily and, 231-232 
Folding 
coiled coil structures and, 56-60 
in fibrillin microfibrils, 412 
Fos:Jun system 
coiled coil design and, 91, 94 
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FSGS. See Focal segmental 
glomerulosclerosis 
Funnel structures 
coiled coil structures and, 67, 68 


G 


GAG. See Glycosaminoglycans 
Globular domains 
in network-forming collagens, 
377, 380-383, 392, 396, 398 
Globular proteins 
coiled coil design and, 83 
Glycine residues 
sequence repeats and, 18-19 
Glycoprotein structure 
of fibrinogen/fibrin, 247, 248 
Glycosaminoglycans (GAG) 
collagen fibrils and, 362, 363 
Glycosylation 
of fibrillin microfibrils, 406-407, 
411, 412, 413 
of fibrinogen/fibrin, 256-257 
Gly-X-Y repeats 
collagen triple helix and, 301-302, 
311-312, 319, 320, 323, 
327, 329, 330 
in network-forming collagens, 377, 386, 
388, 389, 392, 399 
Golgi 
elastin and, 443 
IFAP structure/function and, 177-178 
network-forming collagens and, 389 
G-protein signaling 
IFAP structure/function and, 161-162 
Growth regulation 
of collagen fibrils, 357-359 


H 


H1 subdomains 
IF chains and, 117, 119, 127 
H2 subdomains 
IF chains and, 132-133 
Head domains 
IF chains and, 116-120 
Head-to-tail domains 
sequence repeats and, 23, 24-25 


HEAT repeats 
sequence repeats and, 30 
Helical twists 
in collagen triple helix, 308, 311 
Helical wheels 
coiled coil design and, 84, 85 
Helices, compound 
coiled coil structures and, 39 
Helix initiation motif 
IF chains and, 121 
Helix termination motif 
IF chains and, 131 
Heparan sulfate proteoglycans (HSPG) 
fibrillin microfibrils and, 412 
Heptads 
coiled coil design and, 82-83, 84, 93 
coiled coil structures and, 45, 51, 
53, 57, 61 
fibrous proteins and, 2-3 
IF chains and, 114, 120, 121, 124, 
127, 128, 129, 130 
sequence repeats and, 19-21, 25 
Heterotetramer design 
coiled coil design and, 95-96 
Hexagonal aggregation 
network-forming collagens and, 
376, 389, 390, 391 
Hinged alignment model 
in fibrillin microfibrils, 420-423 
Hodges design 
coiled coil design and, 92-93, 102-103 
Homodimer design 
coiled coil design and, 104-105 
Homopolymerization 
IF chains and, 114-115 
Hourglass structures 
coiled coil structures and, 63, 66, 67 
HSPG. See Heparan sulfate proteoglycans 
Hybrid motifs 
fibrillin microfibrils and, 410 
Hydration networks 
collagen triple helix and, 
314-315, 316, 322 
Hydrogels 
elastin and, 452 
Hydrogen bonding 
in collagen triple helix, 308, 310, 
311-312, 313, 322 
Hydrophilicity 
fibrillin microfibrils and, 411 
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Hydrophobic regions 

coiled coil design and, 83-86, 87, 
89, 90 

coiled coil structures and, 40, 42, 43, 53, 
54, 56-57, 59, 60 

in collagen triple helix, 315-316 

in elastin, 444-445, 447 

fibrillin microfibrils and, 410, 411 

fibrinogen/fibrin and, 256, 259, 263 

in network-forming collagens, 380, 386, 
388, 391-392, 395, 396 


IF. See Intermediate filaments 
IF chains, sequence/structure 

acidic residues and, 117, 121, 122, 125, 
128, 132 

aggregation characteristics and, 122 

apolar residues and, 117, 118, 121, 124, 
126, 129 

assembly of, 131-132, 136 

B-sheets and, 120, 133 

basic residues and, 117, 121, 122, 
125, 126, 132 

charged/apolar residue ratio and, 124 

crosslinking and, 124, 127, 128 

deletions/insertions and, 117, 118 

flexibility of, 124-125, 127-128 

H1 subdomain and, 117, 119, 127 

H2 subdomain and, 132-133 

head domains and, 116-120 

helix initiation motif and, 121 

helix termination motif and, 131 

heptad repeats and, 114, 120, 121, 124, 
127, 128, 129, 130 

homopolymerization in, 114-115 

ionic interactions of, 113-114, 121, 126, 
128, 131 

keratins and, 113, 114, 117-120, 
127, 128, 132-135, 137 

key residues and, 113-114, 131 

linker L1 and, 123-125 

linker L2 and, 128-129 

linker L12 and, 127-128 

mutations and, 122-123, 125, 129 

nestins and, 114 

NFs and, 114, 135 

nucleating point and, 126 


posttranslational modifications and, 
135-136 

rod domains and, 135 

segment 1A and, 120-123 

segment 1B and, 125-127 

segment 2A and, 128 

segment 2B and, 129-132 

stability of, 123, 126 

stammers/stutters/skips and, 
124, 129, 131 

structural domains and, 115-116 

tail domains and, 132-136 

trigger motif and, 126, 131 

x-ray analysis of, 127 


IFAP. See Intermediate filament associated 


proteins 


IFAP structure/function, 143-144 


14-3-3 proteins and, 176-177 
ABD proteins and, 146, 155, 156, 158, 167, 
176-177 
actin rich cortex and, 162 
actins and, 146, 156, 171-173 
apoptosis and, 174-175 
armadillo proteins and, 159-161 
BP230 isoforms and, 152-154 
calcium binding/EF hands and, 
168, 169, 170 
calponin/CH domains and, 
153, 155, 156, 171-172 
cell cycles/kinases and, 175-176 
cell homeostasis and, 144, 181 
cell process control, 181-182 
cell surface interactions and, 159-168 
chaperones/stress proteins and, 173-174 
costameres and, 164-165, 181 
crosslinking and, 144, 146, 170, 171 
desmocalmin/keratocalmin and, 161 
desmoplakins and, 149-152 
desmosomes and, 146, 147, 149, 158 
DPCs and, 162, 164, 165-166 
dystrobrevin and, 164, 165-166 
envoplakin/periplakin and, 157-158 
epiplakin and, 149 
filaggrin and, 168-169 
fimbrin and, 172 
focal contacts and, 162-163 
G-protein signaling and, 161-162 
IF families and, 144, 145, 146 
KAPs and, 170-171 
keratin IFs and, 180 
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IFAP structure/function (cont.) 
lamins and, 144, 181 
membrane trafficking and, 177-178 
membranes/Golgi and, 177-178 
motor molecules and, 178-180 
muscle cell surface and, 163-165 
muscular dystrophy and, 165, 166 
myosin va and, 179 
nebulin and, 172-173 
NFs and, 167-168, 175, 179-181 
nudel and, 179 
Parkinson’s disease and, 179 
pinin and, 161 
PKPs and, 160-161 
plakins and, 146-148, 149 
plectin and, 154-157 
polycystin-1 and, 161-162 
PRDs and, 150, 157, 159 
sequence specificity and, 150-151 
signaling/cell metabolism and, 
144, 173-178 
spectrin/ankyrin and, 166, 167-168 
trichohyalin and, 169-170 
vimentin and, 163, 169, 176, 178-179 
x-ray analysis and, 150 
z-discs and, 163-164, 172-173 
Imidic acids 
collagen triple helix and, 301-302, 308, 
312, 318, 319, 322 
Insertions 
coiled coil structures and, 49, 50, 51 
Integrin binding protein (BP) 
collagen triple helix and, 308, 309, 315, 
316, 327, 328, 329, 332 
Integrins 
fibrillin microfibrils and, 409-410, 412 
fibrinogen/fibrin and, 248, 273-274, 
277-279, 285 
network-forming collagens and, 387 
Interleukins 
fibrinogen/fibrin and, 274-275 
Intermediate filament associated 
proteins (IFAP) 
fibrous proteins and, 4-5 
Intermediate filaments (IF) 
coiled coil structures and, 72 
families of, 144, 145, 146 
fibrous proteins and, 4-5 
Interruptions 
in network-forming collagens, 377, 399 


Tonic interactions 
IF chains and, 113-114, 121, 
126, 128, 131 


K 


KAP. See Keratin-associated proteins 
Keratin IFs 
IFAP structure/function and, 180 
Keratin-associated proteins (KAP) 
IFAP structure/function and, 
170-171 
sequence repeats and, 21-22, 23 
Keratins 
coiled coil structures and, 38, 45, 
57-58, 71-72 
fibrous proteins and, 4 
IF chains and, 113, 114, 117-120, 127, 128, 
132-135, 137 
sequence repeats and, 14, 16, 17, 31 
Key residues 
IF chains and, 113-114, 131 
Kinks/crimps 
collagen fibrils and, 345, 350, 
361, 362, 363 
Knobs-into-holes 
coiled coil structures and, 39, 40, 42, 48, 
49, 51, 53, 54-55, 63, 67, 70 
fibrinogen/fibrin and, 248, 254, 
257, 258, 264, 265, 266, 284 
Knobs-to-knobs 
coiled coil structures and, 54-56, 70 
Knockout mice studies 
collagen fibrils and, 358-359 


Lamins 
IFAP structure/function and, 144, 181 
Latent TGFG-binding proteins (LTBP) 
elastin and, 439-440 
fibrillin microfibrils and, 415-416 
Lateral packing 
in collagen fibrils, 
345-346, 347, 350-354 
Leucine zippers 
coiled coil design and, 
85, 87-88, 91, 97, 101, 104 
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coiled coil structures and, 
42, 45, 56, 57, 58, 61, 70, 72 
sequence repeats and, 19-20 
Leucine-rich repeats (LRR) 
sequence repeats and, 29-30 
Leukocytes 
fibrinogen/fibrin and, 279 
Ligand binding 
collagen triple helix and, 307, 326-328 
Linear/lateral aggregation 
network-forming collagens and, 376, 380, 
388, 396, 399 
Linkers 
fibrillin microfibrils and, 409 
IF chains and, 123-125, 127-128, 129 
spectrin superfamily and, 209, 212, 215, 
218, 220 
Lipid binding 
spectrin superfamily and, 224-225 
Lipoprotein a (Lp(a)) 
fibrinogen/fibrin and, 276-277 
Liquid drop model 
of elastin, 449 
Lp(a). See Lipoprotein a 
LRR. See Leucine-rich repeats 
LTBP. See Latent TGF(@-binding proteins 


MAGP interactions 
elastin and, 439, 444 
fibrillin microfibrils and, 414-416, 
421-422 
Mannose binding lectin (MBL) 
collagen triple helix and, 326, 327, 
330, 332 
Marfan syndrome (MFS) 
fibrillin microfibrils and, 405, 410, 
427-429 
fibrous proteins and, 9 
MASPs 
collagen triple helix and, 326, 327, 328 
Maturation 
of fibrillin microfibrils, 417, 420 
MBL. See Mannose binding lectin 
Membrane trafficking 
IFAP structure/function and, 177-178 
MexA 
coiled coil structures and, 63, 66, 67 
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MFAP. See Microfibril-associated protein 
interactions 
MFS. See Marfan syndrome 
Microcalorimetry 
of fibrinogen/fibrin, 254-255 
Microfibril-associated protein (MFAP) 
interactions 
elastin and, 440 
fibrillin microfibrils and, 414-416 
Microfibrils 
elastin and, 439-440 
Models 
of collagen fibrils, 351-354, 356, 363 
Molecular dynamics studies 
collagen triple helix and, 
318-319, 331 
Molecular packing 
collagen fibrils and, 
345-350, 351, 356 
in collagen triple helix, 316-318 
Molecular trafficking 
spectrin superfamily and, 204 
Motifs 
PROSITE database of, 13 
repetitive, 2-3 
structural/functional repeats and, 
11-12, 17, 18 
Motor molecules 
IFAP structure/function and, 178-180 
Muscle cell surfaces 
IFAP structure/function and, 163-165 
Muscular dystrophy 
IFAP structure/function and, 165, 166 
Mutations 
coiled coil design and, 81 
collagen fibrils and, 345 
collagen triple helix and, 301-302, 
329-332 
fibrillin microfibrils and, 406, 409, 
427-429 
fibrinogen/fibrin and, 
260, 264, 279-281 
IF chains and, 122-123, 125, 129 
Myosin va 
IFAP structure/function and, 179 
Myosin/paramyosin 
sequence repeats and, 20-21, 25 
Myosin/tropomyosin 
coiled coil structures and, 38, 45, 
57-58, 72 
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N-/C- termini 
fibrillin microfibrils and, 
410-411, 413, 414, 423 
NC4 domains 
collagen fibrils and, 357 
Nebulin 
IFAP structure/function and, 
172-173 
Nestins 
IF chains and, 114 
sequence repeats and, 19 
Neurofilament (NF) chains 
IF chains and, 114, 135 
IFAP structure/function and, 
167-168, 175, 179-181 
NF-H sequence repeats and, 15-16, 19 
NF. See Neurofilament chains 
NF-H. See Neurofilament (NF) chains 
NMR. See Nuclear magnetic resonance 
NMR analysis 
collagen triple helix and, 
302-303, 307, 312, 
314, 316, 318, 332 
Nuclear magnetic resonance (NMR) analysis 
of collagen fibrils, 347-348, 349 
of fibrillin microfibrils, 408 
fibrous proteins and, 2, 5, 9 
sequence repeats and, 17, 31 
Nucleating point 
IF chains and, 126 
Nucleation 
collagen triple helix and, 307, 311 
Nudel 
IFAP structure/function and, 179 


(0) 


OI. See Osteogenesis imperfecta 
Oiled coil model 

of elastin, 449-450 
Oligomer specification 

coiled coil design and, 86-90 
Oligomer specificity 

coiled coil structures and, 57-58, 61 
Osteogenesis imperfecta (OI) 

collagen triple helix and, 

329, 331, 332 
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Packing geometries 
coiled coil design and, 86-90 
Packing/acute 
coiled coil design and, 86, 89 
Packing/ parallel 
coiled coil design and, 86, 88 
Parallel structures 
coiled coil design and, 92-101 
Parkinson’s disease 
IFAP structure/function and, 179 
PDB. See Protein Data Bank 
PDI. See Protein disulphide isomerase 
Peptide Velcro design 
coiled coil design and, 93-95, 103-104 
Periodicity 
fibrillin microfibrils and, 419 
PH. See Pleckstrin homology domain 
Phosphorylation/dephosphorylation 
sequence repeats and, 16 
Pinin 
IFAP structure/function and, 161 
Pitch angle 
coiled coil structures and, 44-45 
Pitch length 
sequence repeats and, 19-20 
PKP. See Plakophilins 
Plakin repeat domains (PRD) 
IFAP structure/function and, 
150, 157, 159 
Plakins 
IFAP structure/function and, 
146-148, 149 
Plakophilins (PKP) 
IFAP structure/function and, 160-161 
Plasminogen 
fibrinogen/fibrin and, 248-250, 275, 277 
Platelets 
fibrinogen/fibrin and, 249-250, 262, 274, 
277-279, 281, 285 
Pleckstrin homology (PH) domain 
spectrin superfamily and, 205, 210, 
224-225 
Plectin 
IFAP structure/function and, 154-157 
Polar residues 
coiled coil design and, 89, 91, 92 
Polycystin-1 
IFAP structure/function and, 161-162 
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Polyproline binding domains 

spectrin superfamily and, 225-226 
Posttranslational modifications 

IF chains and, 135-136 
PRD. See Plakin repeat domains 
Prediction/analysis programs 

coiled coil structures and, 45-49 
Preferred repeats 

coiled coil design and, 84, 86 
Prions 

fibrous proteins and, 2 
Procollagens 

collagen fibrils and, 357-358, 363 
Proline/glycine-rich regions 

fibrillin microfibrils and, 410 
Protein Data Bank (PDB) 

coiled coil design and, 89-90 

coiled coil structures and, 48, 61 

collagen triple helix and, 319 
Protein disulphide isomerase (PDI) 

fibrillin microfibrils and, 411 
14-3-3 proteins 

IFAP structure/function and, 176-177 
Proteoglycans 

collagen fibrils and, 341, 342, 357, 358, 

361, 362, 363 

collagen triple helix and, 326 

elastin and, 440 

fibrillin microfibrils and, 412, 416, 426 

fibrous proteins and, 6 

sequence repeats and, 29-30 
Protofibrils 


fibrinogen/fibrin and, 266, 267-268, 273 


Puckers 
in collagen triple helix, 321, 322, 323 


R 


Random chain model 
of elastin, 449 
RBC. See Red blood cell studies 
Recombinant studies 
collagen triple helix and, 316, 319, 
324, 327 
of fibrillin microfibrils, 413 
Red blood cell (RBC) studies 
spectrin superfamily and, 210-211, 218, 
229-230 
Repeats, exact 


sequence repeats and, 15, 16 
RGD integrin recognition sites 

fibrillin microfibrils and, 409-410, 415 
Ridges-into-grooves 

coiled coil structures and, 40, 61, 70 
Right-handed coils 

coiled coil design and, 96-97 
Rod domains 

IF chains and, 135 

spectrin superfamily and, 203, 208, 

209, 219, 220-221, 228, 231 


S 


SAF. See Self-assembled protein fibers 
Salt-bridges 
coiled coil design and, 90-91, 99 
Scanning transmission EM (STEM) mass 
mapping 
fibrillin microfibrils and, 417, 418, 
419, 423 
Secreted protein acidic rich in cysteine 
(SPARC) 
collagen fibrils and, 359 
Secretion 
of elastin, 443 
Segment 1A 
IF chains and, 120-123 
Segment 1B 
IF chains and, 125-127 
Segment 2B 
IF chains and, 129-132 
Self-assembled protein fibers (SAF) 
coiled coil design and, 97-99 
Sendai virus 
coiled coil structures and, 53, 60 
Sequence repeats 
collagen triple helix and, 301-302 
Sequence repeats, in fibrous proteins 
acidic residues and, 12 
actin and, 14 
apolar residues and, 12, 15, 19-21 
basic residues and, 12 
calcium binding/EF hands and, 
14-15, 27-28 
charged residues and, 20 
classes of, 11-15 
collagen and, 13, 17-18 
contiguous repeats and, 12, 19 
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Sequence repeats, in fibrous proteins (cont.) 
desmoyokin and, 19 
EM analysis and, 25 
epiplakin and, 13 
fibrinogen and, 20-21 
o-fibrous proteins and, 14 
filaggrin/profilaggrin and, 17 
filament formation and, 12, 20-21 
glycine residues and, 18-19 
head-to-tail domains and, 23, 24-25 
HEAT repeats and, 30 
heptads and, 19-21, 25 
KAPS and, 21-22, 23 
keratins and, 14, 16, 17, 31 
leucine zippers and, 19-20 
LRRs and, 29-30 
myosin/paramyosin and, 20-21, 25 
nestin and, 19 
NF-H and, 15-16, 19 
NMR and, 17, 31 
phosphorylation/dephosphorylation 
and, 16 
pitch length and, 19-20 
proteoglycans and, 29-30 
SH3 domains and, 22 
short/long exact repeats and, 15, 16 
structural/functional motifs in, 
11-12, 17, 18 
stutters and, 21 
tropomyosin and, 20-21, 24-25, 26-27 
type A, 15-17 
type B, 17-19 
type C, 19-23 
type D, 23-25, 26-27 
type E, 27-30 
VASP and, 21, 22 
ww domains and, 22 
x-ray analysis and, 17, 18, 19, 25, 26-27, 31 
zinc fingers and, 28-29 
Sequence repeats, type A 
in fibrous proteins, 15-17 
Sequence repeats, type B 
in fibrous proteins, 17-19 
Sequence repeats, type C 
in fibrous proteins, 19-23 
Sequence repeats, type D 
in fibrous proteins, 23-25, 26-27 
Sequence repeats, type E 
in fibrous proteins, 27-30 
Sequence specificity 
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IFAP structure/function and, 150-151 
Sequence-to-structure rules 
coiled coil design and, 81, 82-92 
S-folds 
in collagen fibrils, 349 
SH3 domains 
sequence repeats and, 22 
spectrin superfamily and, 
205, 209, 225-226 
Sheet proteins 
coiled coil structures and, 63, 65, 67 
Side chain reactions 
in collagen triple helix, 310, 312, 315 
Signaling 
fibrinogen/fibrin and, 260-261 
IFAP structure/function and, 
144, 173-178 
spectrin superfamily and, 221-222, 224 
Size distribution 
of collagen fibrils, 359-361 
Sorsby’s fundus dystrophy 
network-forming collagens and, 
382-384, 385 
Sources 
of elastin, 438-439 
of fibrinogen/fibrin, 262 
SPARC. See Secreted protein acidic rich in 
cysteine 
Spectrin repeats 
spectrin superfamily and, 203, 206, 207, 
208, 211, 217-219, 220 
Spectrin superfamily proteins 
ABDs of, 203, 208-210, 213-217, 
219, 220, 230 
ABS in, 215-217, 222 
actinins and, 203, 211, 215, 216, 217, 219, 
220-221, 222, 232 
actin-stress fibers and, 204, 211 
ankyrin and, 211, 220, 225, 230, 231 
calcium binding and, 222-224 
cell adhesion and, 211 
CH domains and, 204, 208, 211, 214, 215, 
216, 221-222, 230, 232 
costameres and, 213 
crosslinking and, 204, 207, 209, 
211-212, 222 
cytoskeletal proteins and, 204, 208, 
212-213, 220, 221, 224, 225, 230, 232 
DMD/BMD and, 212-213, 219, 227-229 
DRP and, 212-213 
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dystrophins and, 205, 208, 
212-213, 215, 217, 226, 227 
EF hands and, 203, 208, 209-210, 
222-224, 226 
evolution of, 205-208 
flexibility and, 215, 217, 219 
focal contacts and, 211 
FSGS and, 231-232 
function of, 210-213 
linkers and, 209, 212, 215, 218, 220 
lipid binding and, 224-225 
members of, 205 
molecular trafficking and, 204 
PH domain and, 205, 210, 224-225 
polyproline binding domains and, 
225-226 
RBC studies and, 
210-211, 218, 229-230 
rod domains of, 203, 208, 209, 
219, 220-221, 228, 231 
SH3 domain and, 205, 209, 225-226 
signaling and, 221-222, 224 
spectrin repeats and, 203, 206, 
207, 208, 211, 217-219, 220 
spherocytosis and, 218, 229-231 
structure of, 208-210 
utrophin and, 205, 206, 208, 210, 
213, 215, 217, 220, 224, 226, 227 
ww domain and, 205, 210, 224, 226 
z-discs and, 211, 224 
zz domain and, 210, 226, 227, 228 
Spectrins 

coiled coil structures and, 40, 42 

fibrous proteins and, 1, 5, 10 
Spectroscopic analysis 

coiled coil design and, 101 
Spherocytosis 

spectrin superfamily and, 218, 229-231 
Stability 

in coiled coil design, 90-91 

of coiled coil structures, 56-60 

of collagen triple helix, 302, 323-326 

of IF chains, 123, 126 

network-forming collagens and, 

379, 381-382 

Stabilization 

of collagen fibrils, 346 

collagen triple helix and, 317, 319-323 
Staggered alignment model 

in fibrillin microfibrils, 420-423, 424-426 
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Staggering 
in collagen triple helix, 302, 306, 
317, 318, 324 
in fibrillin microfibrils, 419, 420 
Stammers/ stutters/skips 
coiled coil structures and, 49, 51, 54, 55 
IF chains and, 124, 129, 131 
sequence repeats and, 21 
Standard model 
of coiled coil structures, 40-45 
STEM. See Scanning transmission EM mass 
mapping 
Stress/strain 
collagen fibrils and, 365-366 
elastin and, 446 
Structural discontinuities 
coiled coil structures and, 49, 50, 52-53, 
53-56 
Structural diversity 
coiled coil structures and, 60-70 
Structural domains 
IF chains and, 115-116 
Structural properties 
of collagen triple helix, 303-304 
of elastin, 447-448 
of fibrinogen/fibrin, 251-260 
Structural themes 
in network-forming collagens, 
396-399 
Structure 
of elastin, 438-439 
of spectrin superfamily proteins, 
208-210 
Subfibrillar structures 
of collagen fibrils, 356 
Supercoil axis 
coiled coil structures and, 
40, 43-44 
Supercoiling 
in network-forming collagens, 
377, 382, 387-388, 
392-396, 398, 399 
Superhelical distortion 
coiled coil structures and, 39 
Suprabundle structures 
collagen fibrils and, 365 
Suprafibrillar structures 
collagen fibrils and, 361-365 
Synthetic matrices 
of elastin, 452-453 
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T 


TAFI. See Thrombin activable fibrinolysis 
inhibitor 
Tail domains 
IF chains and, 132-136 
TAMD. See Theoretical axial mass 
distributions 
Tandem repeats 
in fibrillin microfibrils, 408 
TB motifs 
in fibrillin microfibrils, 407-408, 
409-410, 420 
Telopeptides 
in collagen fibrils, 346-349 
Tensile strength/creep 
collagen fibrils and, 359, 366-367 
Tetrabrachion stalks 
coiled coil structures and, 52, 53-54, 60 
TGFß. See Transforming growth factor 
Theoretical axial mass distributions (TAMD) 
fibrillin microfibrils and, 422, 425 
Thrombin 
fibrinogen/fibrin and, 247, 258, 
263-264, 277 
Thrombin activable fibrinolysis inhibitor 
(TAFI) 
fibrinogen/fibrin and, 274-275 
Thrombospondin 
fibrinogen/fibrin and, 274 
Tilts 
in collagen fibrils, 350, 353-354 
TIMP3. See Tissue inhibitor 
metalloproteinase 3 
Tissue inhibitor metalloproteinase 3 
(TIMP3) 
network-forming collagens and, 384, 385 
Tissue-type plasminogen activator (tPA) 
fibrinogen/fibrin and, 275-277 
tPA. See Tissue-type plasminogen activator 
Transcription 
of elastin, 441-442 
Transforming growth factor (TGFß) 
elastin and, 443 
fibrillin microfibrils and, 409-410, 
415-416, 428-429 
Transgenic mice 
fibrinogen/fibrin studies and, 278, 279 
Trichohyalin 
IFAP structure/function and, 169-170 
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Trigger motif 
IF chains and, 126, 131 
Trigger sequence 
coiled coil structures and, 59 
Tropoelastin 
elastin and, 437, 439, 440, 443, 444, 445 
fibrillin microfibrils and, 415, 416, 
421-422 
fibrous proteins and, 8-9 
Tropomyosin 
fibrous proteins and, 3 
sequence repeats and, 20-21, 24-25, 
26-27 
Tube structures 
coiled coil structures and, 63, 64 


U 


Ullrich syndrome 
network-forming collagens and, 376 
Utrophin 
spectrin superfamily and, 205, 206, 
208, 210, 213, 215, 217, 220, 
224, 226, 227 


V 


Vascular endothelial growth factor 
fibrinogen/fibrin and, 274-275 
Vasodilator-stimulated phosphoprotein 
(VASP) 
sequence repeats and, 21, 22 
VASP. See Vasodilator-stimulated 
phosphoprotein 
Vimentin 
IFAP structure/function and, 
163, 169, 176, 178-179 
von Willebrand domains 
fibrinogen/fibrin and, 274 
in network-forming collagens, 


380-381 
W 
Water 
collagen triple helix and, 
310, 312, 315 
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WW domains of IF chains, 127 
sequence repeats and, 22 IFAP structure/function and, 150 
spectrin superfamily and, of network- forming collagens, 388 
205, 210, 224, 226 sequence repeats and, 17, 18, 19, 25, 
26-27, 31 
X 
Z 
X-ray analysis 
coiled coil design and, 87, 98 Z-discs 
coiled coil structures and, 37-38, 60 IFAP structure/function and, 
of collagen fibrils, 346-348, 163-164, 172-173 
349, 350, 351, 365, 366 spectrin superfamily and, 211, 224 
collagen triple helix and, Zinc fingers 
302-303, 307, 311, 331 coiled coil structures and, 61 
of fibrillin microfibrils, 408, 419 sequence repeats and, 28-29 
of fibrinogen/fibrin, 248, 251, 255, 256, ZZ domains 
257-259, 265, 270, 284 spectrin superfamily and, 


of fibrous proteins, 2, 5-6, 9 210, 226, 227, 228 


